Re: Categorization of packages (was Re: Aptitude, ARs)


On Fri, Mar 14, 2003 at 08:06:54PM -0500, Daniel Burrows wrote:
> On Fri, Mar 14, 2003 at 05:03:22PM -0800, Osamu Aoki <osamu@debian.org> was heard to say:
> > On Fri, Mar 14, 2003 at 05:14:41PM -0500, Daniel Burrows wrote:
> > > On Fri, Mar 14, 2003 at 11:54:41AM -0800, Osamu Aoki <osamu@debian.org> was heard to say:
> > > > I usually have patience to look at up to 25 (1 screen).
> > > > 
> > > > 25*25*25 = 15625 branch ends.
> > > 
> > >   That's only true if it's balanced; remember that about half those
> > > packages are editors ;-)  Also, some leaves might appear in multiple
> > > places.
> > 
> > Yes.  That is why at least 3 levels are needed for current Debian
> > archive.  You always made a small sub group if the belong to both
> > branch.  But since these cross over sub-category are created manually,
> > it is not much used.  In Daniels sub-categorization, I was observing
> > something like:
> > 
> >   editor-wordprocessor-tex node contain lyx package
> >   
> >   editor-wordprocessor and  tex-tools nodes both contain 
> >   editor-wordprocessor-tex node as sub-node
> >
> > Also some sub-category can cross over (tex and editor) but some are just
> > subdivision of the other (game --> game-action, game-tetris, ...).
> > 
> > Anyway, freshmeet categorization is interesting. I extracted its keys
> > from the source of its web page :-)  Instead of pushin current
> > categorization, I may choose to use that one.
>   I'm not quite sure what you mean.  There are some subgroups that
> belong to multiple groups, but I can assure you that individual packages
> can belong to multiple groups as well.  (incidentally, I assume the LyX
> example is hypothetical?  LyX belongs to the TeX group and the Word
> Processors group as far as I can see/remember)

Sorry, it was hypothetical.  I got confused with my local one.  

Your trees are:

ROOT -> editor -> editor-wordprocessor
ROOT -> editor -> editor-* (everything other than editor-tex)
ROOT -> editor-tex         (This could have uses "tex" as node name)

>   I generally created specialized subgroups when I found large numbers
> of packages which where related; if they all happened to reasonably
> belong to two groups as well, I put the newly created group under both
> groups.
>   Anyway, there are (according to apt-cache stats) exactly 12585 binary
> packages TODAY.  I think that you're being too optimistic if you believe
> a natural (ie, easy to navigate) categorization exists which is balanced
> enough to squeeze this into 3 levels and 25 leaves in each lowest
> screen.

No I do not.  My point was current 2 levels alone to narrow down was way
too small levels for 12585 binary packages.  

>   I'm not sure about the freshmeat categorization system -- I haven't
> used it enough to really be familiar with it.

I extracted their key word from the web.  I could not find explicit
explanation.  I put it to http://people.debian.org/~osamu/pub/

   fm-cat.txt    : as extracted from their web
   osamu-cat.txt : I touched up to make it somewhat reasonabel to Debian

>   Oh.  They're documented in the section "SEARCHING, LIMITING, AND

Thanks.  I thought I saw it somewhere. :-)

>   I think that for the full documentation, it might help if it were
> written in some structured format with a table-of-contents.  I'm not
> sure if I want to include a viewer for something like that in aptitude
> or if I should ship HTML and spawn a web browser, but the README is a
> bit unwieldly right now.

html as the source?  

I can make debiandoc-sgml quite easily from your text which can make
 1. text
 2. html (single page or multi page)
 3. PDF
 4. PS
 5. info (but why bother.)

Building PDF and PS are tricky but text and html are easy.  It will
provide TOC automatically.  Do you wants me to do it?

