[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Do we need better documentation about our subsections?

On Sat, Sep 25, 2004 at 05:16:51PM +0000, Thaddeus H. Black wrote:

(thread back to the list with author's permission)

> > I know debtags and Enrico, but will debtags
> > replace our current subsections after sarge ?
> > Was it discussed before? Are you needing help?
> Put simply, for several months after sarge
> freezes, I will be working on finishing and
> checking the tagging of sarge's packages.  But
> simultaneously also beginning at freeze time,
> new packages for sarge+1 will begin to flow into
> sid at a rate of about 100 binary packages per
> week.  I will not be able both to finish the old
> and to keep up with the new at the same time.
> Gustavo, we need help tagging sarge+1 packages
> for debtags as they come in.  If this were done,
> then when the sarge tags are fully ready, the
> tags base would be immediately up to date.
> Immediate help in this would thus be very much
> appreciated.  If you would help, please begin
> preparing your scripts now, because the
> opportunity to act begins the day sarge freezes.
> As far as I know, no one else is ready to act on
> this, so this would be your project.

Yes, there should be more tagging efforts.  I have some explanation
about why this doesn't happen:

 1. The interfaces to do that are still crappy
    (debtags-edit tries to help, but isn't exciting either)
 2. Facets and tags are not documented
    athough they may be fairly intuitive, there's not much documentation
    on how to really do the tagging yet.  Although per-facet and per-tag
    documentation could already be written, more general documentation
    suffers from the fact that:
 3. The set of available tags is in continuous refinement
    and I feel like we should live with it, as our way of describing
    packages should be able to evolve as packages evolve, and as our
    understanding of packages evolve.
    Ironically, our understanding of packages evolve while we describe
    them, so this sounds like a never-ending iterative process.
 3. Debtags has no official hat
    since I would always like things to get better before an official
    launch, I never make an official launch :)
    It would be nice to at least have a mention about debtags, if still
    as an ongoing development project, together with every mention of
    subsections (at least, bug 144046 has officially been reassigned to
    debtags, so it seems to be our job after all)
 4. Package Tags are not yet really in use
    and this should change fairly soon, as work is underway to
    debtags-enable various package managers

> In my view, that is the highest-priority task,

Yes, it's very important, as it would allow the tag set to expand with
knowledge that is not just mainly mine and Erich's.  That would also
mean having more people messing with the tag vocabulary, and more brains
thinking about it.

> but if it did not suit you then here are several
> others our leader Enrico has suggested.

I'll comment on these, possibly giving status updates:

>   * Design how to distribute localized
>     translations of tag names and descriptions

Help would be so much appreciated.  We have had suggestions on how to do
it (convert the vocabulary to .po files to allow translators to use the
existing tools), however I still haven't had time to implement i18n into
the library, and I'd so much appreciate help in that.

>   * Design automatic methods to infer tags from
>     existing package data; for example, Javier
>     Fern�ez-Sanguino Pe�uggested the use of
>     TFIDF techniques to extract keywords from
>     package descriptions

I started an autodebtag project to create software to infer tags from
existing package data.  You can find it at:
This time it's in perl and not on C++, and so maybe more people would
like to lay their hands on it.  That would really need more brains and
fantasy.  That'd also be the place to do some TFIDF computations, if
someone wants to implement it.

Curious people are welcome to contact me, send patches, get added to the

>   * Find a good way to maintain the central
>     Debian tag vocabulary (technical
>     infrastructure, people in charge of it)

Here is Erich's domain; unfortunately, he's very busy.  He'll probably
put some or all of the server infrastructure on subversion soon, so
there will probably be possibilities to help there as well.

>   * Discuss the idea of "Adopting" tags, that is
>     having people who take care of the
>     correctness of the list of packages
>     associated to a given tag (which another
>     point of view compared to checking that all
>     tags associated to a package are correct)
>     (Suggested by Erich Schubert)

This would be a really effective idea for having quality assurance.
Another powerful way to quality assurance is having package managers
which use tags, so that inconsistencies can be spotted and signaled in
everyday life.

>   * Discuss the idea of "Outsourcing" the
>     maintenance of some tags: for example the
>     Gnome and KDE people could take care of
>     maintaining the tag data related to Gnome
>     and KDE.

Here the idea of facets helps, as different groups could take care of
facets related to areas they know better.  For example, the Agnula
people have volunteered to maintain the "sound" facet.

It is still not clear how to do that, however: should all data be kept
in a central database with different access policies, or should everyone
put its data available and people just add and remove sources from
/etc/debtags/sources.list?  Or maybe a mix of both approaches.

And what instruments should be provided to these group of people to
track the facets they chose to maintain?

>   * Inclusion of "Tag:" fields in package
>     control files

Here I'm unsure if it should be done, as the tag data is updated more
often than the package data, as new facets show up, or some are
reorganized.  However, having something like that means reminding debian
developers to tag their new packages, which would be very important: as
you say, if 100 new packages per weeks hit the archive, just tagging
those is a pretty tough job, and it would be extremely handy to have
developers at least provide a first, possibly imperfect categorization.

> I appreciate your interest.  I think that we all
> appreciate it.  Tagging the archive is a big
> task.  Please join our low-volume list
> debtags-devel@lists.alioth.debian.org.

Yes, please everyone interest join, and make questions.  I have the
problem that I think a lot and write more code than explanations, so I
get in the situation that I have ideas but people don't know about them,
or people don't know where I'm heading and just wait to see what happens
instead of contributing.  Please everyone help me getting things out of
my mind and into the wild, and possibly also getting them done ;-)



GPG key: 1024D/797EBFAB 2000-12-05 Enrico Zini <enrico@debian.org>

Attachment: signature.asc
Description: Digital signature

Reply to: