On Mon, Apr 22, 2002 at 01:04:28PM -0400, Joey Hess <joeyh@debian.org> was heard to say:
> It's a very fine piece of work, and it must be a hell of a
> lot of work to keep it all updated independant of the rest of Debian.

  As a matter of fact, it has not been updated for several months now.
I've been doing other things and haven't had time to keep it up-to-date.
The mtime of my private set of classifications is November 22, 2001.

  Creating the initial set of data took me 5-10 hours a day for a week
or two, and it was incomplete.  Keeping it mostly up-to-date afterwards
took about 5-10 minutes a day on average (0 minutes some days, more
others, depending on how many new packages there were)

  According to some records (in the form of a list of "new" packages;
I tracked which packages were uncategorized by leaving them marked as
"new"), about 1539 packages have been added to the archive since I quit
regularly updating my classifications.

  As this emphasizes, the issue with categorizing packages (unless you
have a superbly clever AI or something) is not code or data structure
specifications; we have lots of smart people who can come up with these
until the cows come home, and most of them will take about an hour
(tops) to implement.

  The problem is that we have 9500 packages, and it's really hard
to classify all of them in a sane and consistent manner -- from the
sheer volume if nothing else.  More than that, the problem is that
people would rather theorize about the best possible ontological
classification on mailing lists than sit down and categorize packages.

  I went to a little trouble to make patches against aptitude hierarchies
meaningful (when you save one from inside aptitude, it alphabetizes
everything (as opposed to dumping them in some random hashed order); this
lets context diffs make sense)  If you think we need to improve our
classification system, PLEASE either come up with your own code or send
me patches to the data files!
  [this is a general request, not directed at Joey :) ]
  Threads on this list have covered almost every possible scheme for
storing, indexing, and categorizing data; someone needs to pick something
and do it.  I can keep haphazardly indexing stuff as I get time, but the
task really needs someone to focus single-mindedly on it.


