[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Misclassification of packages; "libs" and "doc" sections



Drake Diedrich has suggested some fine ideas, too. Here I quote
his remarks along with my notes.

>    It seems that Debian really requires multiple classification schemes.
> Most of the problems seem to be in overlapping different classifications.
> A few off-the-top suggestions:
> 
> Function          - editor, MUA, MTA, multimedia players, newsreaders,...
> Field             - Science, Games, Web, Networking, ...
> Technology        - X11, Gnome, KDE, Hurd, Linux, Perl, Python, C++, ...
> Author/Foundry    - FSF, Adobe, MIT, Apache, BSD, Individual, ...
> License           - GPL, PD, BSD-like, MIT-like, Free, Non-Free
> Distributor       - Debian  (other source might be Storm, Corel, ...)
> 
>    Different classification policies could exist within each scheme. Each
> scheme could have a catchall Misc category for packages that haven't (yet)
> been categorized, creating a natural todo-list for scheme and package
> maintainers.

This is true, and shows us why a single tree hierarchy won't work. There's
more than one way to view the hierarchy. As I see here, Field is similar
to Debian sections.

In fact, this may be a prototype of a "top-level software ontology"
in Debian. That is, these are the most general categories that one can
possibly think of. Since these are top-level categories, any package
would fall into a top-level ontology like this.

The graph-based hiearchical categorization I suggested plays nicely with
the idea of merging multiple classification schemes. Each of "function,
field, technology, foundry, license, distributor" simply becomes a supercategory
in my proposal. Though the actual top-level ontology has still to be
figured out. The most general superclasses in an ontology are those
that divide all items into broad classes. Thus, these distinctions depend
on which packages exist. In other words, a thorough analysis of _all_ packages
will be required to come up with the right top-level ontology. I believe
that current OOA techniques will suffice for the task, which consists largely
of making a category hierarchy and hacking it until it works well.

>    It would probably be worthwhile to implement categorization in multiple
> levels.  Maintainer puts fields in the .deb control file.  Archve, release,
> and scheme maintainers take that as default, but can override it in a
> separately maintained override database that is used to generate
> Packages.gz.  I'd strongly recommend a new control tag for
> classification, rather than reusing an existing one.  That would make
> spotting unclassified packages easier, and avoid any breakage during
> transition.  Perhaps just multiple Classification: entries
> 
> (in the .deb)
> Classification: Function-editor, Function-MUA, Technology-Emacs, Technology-X11, Author-FSF
> 
> (added by scheme maintainer)
> Classification: License-GPL
> 
> (taken away by a release manager of a miniature X-less release)
> Classification: -Technology-X11

Making classification an add-on is a good idea, as it would be
the cleanest to implement it. Letting others override classification
works well, it would ease making custom distributions.

-- 
Eray (exa) Ozkural
Comp. Sci. Dept., Bilkent University, Ankara
e-mail: erayo@cs.bilkent.edu.tr
www: http://www.cs.bilkent.edu.tr/~erayo



Reply to: