[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Announcing Debian Package Tags



Hello.

The size of Debian increases, and the Sections: system has proven unable to
scale to keep pace with it.  There has been much consensus around a multiple
tags per package solution, and now, yes, it has become a reality.

As the first step for proposing its introduction, we have realized a neat and
clean implementation that integrates with existing Debian systems without
changing a single bit of them.

In cooperation with Erich Schubert and after much discussion with Hervé
Eychenne, I'm happy and proud to announce a new outstanding feature for our
beloved Debian systems:


			* The Debian Package Tags *


 * Quick Start Instructions

 1) Add these sources to your /etc/apt/sources.list:

	deb http://people.debian.org/~enrico/ unstable/$(ARCH)/
	deb http://people.debian.org/~enrico/ unstable/all/
	deb-src http://people.debian.org/~enrico/ unstable/source/
	
 2) Run "apt-get update ; apt-get install tagcoll debtags"

 3) Run "debtags update": the package tag database will then be in
    /var/lib/debtags/package-tags

 4) Read the EXAMPLES section of the debtags(1) manpage for some example
    queries

 5) Contribute to the package tags editing at http://debian.vitavonni.de/packagebrowser/


 * About package tags

Tags, or we can call them keywords or categories, are short names symbolizing
qualities, features or other characteristics of an item.

Package tags are tags that get attached to debian packages to represent a given
quality, like having a specific feature, offering a specific service, working
with a specific set of data, or in a specific environment, and so on.

tags can be thought as the evolution of the package sections historically used
in Debian systems.  Unlike what happens with package sections, the package tags
system is designed so that more than one tag can be attached to a package, and
so that all attached tags have the same importance.


 * Shortcomings of the package sections system

When the sections sytem has been introduced, Debian was a small distribution.
Package sections have been a good idea to separate existing packages by area of
interest, and worked well for some time.

Now, however, the situation has changed.  We have more than 11.000 binary
packages (as of January 2003), and this variety does not fit anymore with the
one-section-per-package approach: a full-featured web browser such as Mozilla,
for example, is it section `net', section `web' or section `mail'?  

The section system does not scale to this extent: package tags are the intended
replacement.


 * Advantages of package tags

Package tags can be used to put in evidence the relevant aspects and qualities
of a package.  Unlike package sections, they do not impose a choice of "the
most important aspect".  Unlike package hierarchies, they do not impose an
ordering on the importance of the tags.

All the structure necessary to present the package archive in an organized way
can be generated automatically from the package tag database.  This means that
such auto-generated structures will always reflect the up-to-date and real
situation of the package archive, wherease other hand-crafted organizations
risk costing a great effort to build and becoming outdated as the contents of
the archive change.

Package tags also enable new kind of queries to the package archive:

 - you can query packages with a given quality, or packages without it;
 - you can query for qualities, like it is done now with fulltext search in
   apt-cache, but without incurring in false positive because of ambiguous
   words, or because words match in a wrong context;
 - simple set operations on the sets of tags assigned to packages can define a
   "distance" function, that can be use to compute a list of packages similar
   to a given one.

These, and the smart hierarchy generation algorithm implemented in the
`tagcoll' utility, are only the first applications that have been thought so
far: the possibilities are far from having been fully explored.


 * The package tags system

The package tags system consists on two pieces of information: the normative
tag vocabulary and the package tags database itself.

 The normative tag vocabulary

The normative tag vocabulary is a list of all available tags that people can
choose from.  Every tag is accompained by a brief description of its meaning
and intended usage, to avoid possible mininterpretations, and possibly a list
of explicit implications, that is a list of tags automatically implied by a
given one.  For example, the `C++' tag currently implies `languages' and
`devel', and so they can be added automatically when C++ is attached to the
package in the editing phase.

The vocabulary is maintained by a (yet to be formed) task force that will edit
it cooperatively via CVS.  An up-to-date version of the vocabulary is then
shipped with the debtags package itself.

Changes that happened in the vocabulary between its various revisions will be
summarized by the task force in an `upgrade-checklist' document, analogous to
the one that ships with the Debian policy, to be used by the package tag
database editors to keep it up to date with the evolution of the tag domain.

 The package tags database

The package tags database is the list of tags attached to each package.

It is maintained cooperatively using Erich Schubert's Debian Package Browser,
found at http://debian.vitavonni.de/packagebrowser/

The package browser provides periodical snapshots of the tags database, that
are downloaded by debtags and installed in the system.


 * Internationalization and customization

The package tags system has been designed from the start to fully support
internationalization and customization.

 Internationalization

Internationalizing the package tags is so easy that it's hard to explain in a
section long ehough to appear serious, and so I'm filling it with this
nonsense.

Basically, the tags themselves are not designed to be displayed to the user,
but to work as a sort of pointers to an internationalized name and description
database.  This is not in conflict with the descriptions found in the
vocabulary database, since they are not intended to be the user-serviceable tag
descriptions, but an help for the translators to provide a correct and
unambiguous description of the tags in their target language.

The format and location of the internationalized tag descriptions have not yet
been decided, since this prototype implementation has just been released.  The
Debian translation teams will be contacted shortly for help, since design
choices in this field must take their experience in account.

 Customization

As Debian grows bigger and bigger, his nature changes.  A new idea of a Debian
system is that of a Debian Universe of packages, of which subprojects or
metadistros offer a custom view targeted at a special audience.  Subprojects
already exist to target Children, Medical, Music and Law environments, and
other such customization efforts are being done by developers in Extremadura
(Spain), or by projects such as Morphix.

The package tags system offers the possibility to customize the package tags
database so that it better reflects the needs of the intended audience.

The nature of package tags allow the possibility of writing flexible tag patch
files that represent changes to a tags database.  These tag patch files can be
applied to any version of the database, even future ones, without generating
conflicts.

What this means is that a customization team can edit the tag database, produce
a patch file with the `tagcoll' utility and ship it so that it's installed in
the /etc/debtags/tagpatch.d/ directory.  debtags will then pick it up during
the `update' cycle and apply it to the downloaded tag data.

In this way, the package tags will be kept up-to-date and the changes will be
preserved.



Yours truly,

Enrico

--
GPG key: 1024D/797EBFAB 2000-12-05 Enrico Zini <enrico@debian.org>

Attachment: pgpniSgF_pL35.pgp
Description: PGP signature


Reply to: