[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Bug#144046: general: Sections are not finely grained



(Daniel: Note CC to you, I wouldn't mind further discussing this in
private :)

On Mon, 22 Apr 2002 14:13:07 -0400
Daniel Burrows <dburrows@debian.org> wrote:
>   Creating the initial set of data took me 5-10 hours a day for a week
> or two, and it was incomplete.  Keeping it mostly up-to-date
> afterwards took about 5-10 minutes a day on average (0 minutes some
> days, more others, depending on how many new packages there were)

That's about what I figured.

>   The problem is that we have 9500 packages, and it's really hard
> to classify all of them in a sane and consistent manner -- from the
> sheer volume if nothing else.  More than that, the problem is that
> people would rather theorize about the best possible ontological
> classification on mailing lists than sit down and categorize packages.

I've dealt with large data sets like this before (specifically
categorisation of some 14,000 error messages, which a tech support
person would look up). There is no way to categorise them in a "sane and
consistent manner". That would require a different heirarchy for each
different cultural/moral/philisophical background. Categorisations
basically depend on how the reader emphasises certain concepts/words.

What we ended up doing was picking somebody who would, as part of their
job description, categorise new error messages as they were created.
They were staff and unionised, not contract, so they were likely to be
around for years.

She's the one who did the original categorisation. People who had to
consult the list often may have categorised things differently than her,
but over a (surprisingly short) period of time, they and their brains
were able to take into account any number of a thousand variables and
predict with a great degree of accuracy exactly where something would be
in the tree.

The moral is that you really *can't* categorise everything in such a way
that everybody would know exactly where something is the first time they
look at a list. The next best thing is having a single person do all the
categorisation. Giving packages multiple places in the tree is
*extremely* good, it almost eliminates the need to have a single person
do it. (Though in my experience, it's still better done that way)

-- 
________________________________________________________________________
\ David B. Harris, Systems administrator   |   http://www.terrabox.com /
/  eelf@sympatico.ca, elf@terrabox.com     |     http://eelf.ddts.net  \
\======================================================================/
/ Clan Barclay motto: Aut agere, aut mori.  (Either action, or death.) \
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Attachment: pgpeS6jYdTjHW.pgp
Description: PGP signature


Reply to: