On Mon, Apr 28, 2003 at 01:55:22PM +0200, Enrico Zini wrote: > Hello. > > The size of Debian increases, and the Sections: system has proven unable to > scale to keep pace with it. There has been much consensus around a multiple > tags per package solution, and now, yes, it has become a reality. (...) Have you considered using TFIDF (or similar) based tools in order to generate and maintain the tag database?  I considered this concept a while back (and developed a crude hack 'dpkg-iasearch') Analysing package descriptions and automatically extracting "clusters" of related packages would be something that would be made more easily with these kind of techniques (implemented in the 'bow' library). That's just my 2c, I consider this a wonderful job you are doing! Regards Javi  One of the difficult things in the future might be to generate new tags or associate new packages to tags already available. Automatising (sp?) this would be useful and TFIDF (and similar IA-related techniques) help with this quite a lot.
Description: PGP signature