Hello. The size of Debian increases, and the Sections: system has proven unable to scale to keep pace with it. There has been much consensus around a multiple tags per package solution, and now, yes, it has become a reality. As the first step for proposing its introduction, we have realized a neat and clean implementation that integrates with existing Debian systems without changing a single bit of them. In cooperation with Erich Schubert and after much discussion with Hervé Eychenne, I'm happy and proud to announce a new outstanding feature for our beloved Debian systems: * The Debian Package Tags * * Quick Start Instructions 1) Add these sources to your /etc/apt/sources.list: deb http://people.debian.org/~enrico/ unstable/$(ARCH)/ deb http://people.debian.org/~enrico/ unstable/all/ deb-src http://people.debian.org/~enrico/ unstable/source/ 2) Run "apt-get update ; apt-get install tagcoll debtags" 3) Run "debtags update": the package tag database will then be in /var/lib/debtags/package-tags 4) Read the EXAMPLES section of the debtags(1) manpage for some example queries 5) Contribute to the package tags editing at http://debian.vitavonni.de/packagebrowser/ * About package tags Tags, or we can call them keywords or categories, are short names symbolizing qualities, features or other characteristics of an item. Package tags are tags that get attached to debian packages to represent a given quality, like having a specific feature, offering a specific service, working with a specific set of data, or in a specific environment, and so on. tags can be thought as the evolution of the package sections historically used in Debian systems. Unlike what happens with package sections, the package tags system is designed so that more than one tag can be attached to a package, and so that all attached tags have the same importance. * Shortcomings of the package sections system When the sections sytem has been introduced, Debian was a small distribution. Package sections have been a good idea to separate existing packages by area of interest, and worked well for some time. Now, however, the situation has changed. We have more than 11.000 binary packages (as of January 2003), and this variety does not fit anymore with the one-section-per-package approach: a full-featured web browser such as Mozilla, for example, is it section `net', section `web' or section `mail'? The section system does not scale to this extent: package tags are the intended replacement. * Advantages of package tags Package tags can be used to put in evidence the relevant aspects and qualities of a package. Unlike package sections, they do not impose a choice of "the most important aspect". Unlike package hierarchies, they do not impose an ordering on the importance of the tags. All the structure necessary to present the package archive in an organized way can be generated automatically from the package tag database. This means that such auto-generated structures will always reflect the up-to-date and real situation of the package archive, wherease other hand-crafted organizations risk costing a great effort to build and becoming outdated as the contents of the archive change. Package tags also enable new kind of queries to the package archive: - you can query packages with a given quality, or packages without it; - you can query for qualities, like it is done now with fulltext search in apt-cache, but without incurring in false positive because of ambiguous words, or because words match in a wrong context; - simple set operations on the sets of tags assigned to packages can define a "distance" function, that can be use to compute a list of packages similar to a given one. These, and the smart hierarchy generation algorithm implemented in the `tagcoll' utility, are only the first applications that have been thought so far: the possibilities are far from having been fully explored. * The package tags system The package tags system consists on two pieces of information: the normative tag vocabulary and the package tags database itself. The normative tag vocabulary The normative tag vocabulary is a list of all available tags that people can choose from. Every tag is accompained by a brief description of its meaning and intended usage, to avoid possible mininterpretations, and possibly a list of explicit implications, that is a list of tags automatically implied by a given one. For example, the `C++' tag currently implies `languages' and `devel', and so they can be added automatically when C++ is attached to the package in the editing phase. The vocabulary is maintained by a (yet to be formed) task force that will edit it cooperatively via CVS. An up-to-date version of the vocabulary is then shipped with the debtags package itself. Changes that happened in the vocabulary between its various revisions will be summarized by the task force in an `upgrade-checklist' document, analogous to the one that ships with the Debian policy, to be used by the package tag database editors to keep it up to date with the evolution of the tag domain. The package tags database The package tags database is the list of tags attached to each package. It is maintained cooperatively using Erich Schubert's Debian Package Browser, found at http://debian.vitavonni.de/packagebrowser/ The package browser provides periodical snapshots of the tags database, that are downloaded by debtags and installed in the system. * Internationalization and customization The package tags system has been designed from the start to fully support internationalization and customization. Internationalization Internationalizing the package tags is so easy that it's hard to explain in a section long ehough to appear serious, and so I'm filling it with this nonsense. Basically, the tags themselves are not designed to be displayed to the user, but to work as a sort of pointers to an internationalized name and description database. This is not in conflict with the descriptions found in the vocabulary database, since they are not intended to be the user-serviceable tag descriptions, but an help for the translators to provide a correct and unambiguous description of the tags in their target language. The format and location of the internationalized tag descriptions have not yet been decided, since this prototype implementation has just been released. The Debian translation teams will be contacted shortly for help, since design choices in this field must take their experience in account. Customization As Debian grows bigger and bigger, his nature changes. A new idea of a Debian system is that of a Debian Universe of packages, of which subprojects or metadistros offer a custom view targeted at a special audience. Subprojects already exist to target Children, Medical, Music and Law environments, and other such customization efforts are being done by developers in Extremadura (Spain), or by projects such as Morphix. The package tags system offers the possibility to customize the package tags database so that it better reflects the needs of the intended audience. The nature of package tags allow the possibility of writing flexible tag patch files that represent changes to a tags database. These tag patch files can be applied to any version of the database, even future ones, without generating conflicts. What this means is that a customization team can edit the tag database, produce a patch file with the `tagcoll' utility and ship it so that it's installed in the /etc/debtags/tagpatch.d/ directory. debtags will then pick it up during the `update' cycle and apply it to the downloaded tag data. In this way, the package tags will be kept up-to-date and the changes will be preserved. Yours truly, Enrico -- GPG key: 1024D/797EBFAB 2000-12-05 Enrico Zini <enrico@debian.org>
Attachment:
pgpniSgF_pL35.pgp
Description: PGP signature