[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Long blurbs repeated in many package descriptions considered harmful



Hello,

some groups of packages in Debian share introductory pieces in the
package description.  For example, most pike packages have this:

 Pike is an interpreted, object-oriented, dynamic programming language
 with a syntax similar to C. It includes many powerful data types and
 a module system that, for instance, provides image manipulation together,
 with support for graphics formats like SVG, JPG, PNG, GIF, XCF and many
 others,  database connectivity, advanced cryptography, XML/HTML parsers
 and others. To learn more about pike, please visit http://pike.ida.liu.se/

While these blurbs are informative, they provide information not
strictly related to the package itself.  As a consequence, if you do
"apt-cache search image" you get all of pike, including irrelevant
things like "pike7.6-public.network.pcap" or
"pike7.6-public.protocols.syslog".

This is not normally annoying in simple apt-cache search queries, but it
becomes nasty when trying to do some smarter text mining on the package
descriptions.  Think bayesian tools[1], or my new algorithms for mapping
a keyword search to a tag search[2].
  
Many thanks to the KDE developers for removing the similar blurb that
they used to have.  They did it nicely, and in a way that others could
follow.

Would it be worth adding a mention to this to the package description
part of the developers-reference?


Ciao,

Enrico

[1] for example, we have something cooking up as a Summer of Code
    project: http://wiki.debian.org/SummerOfCode2006 (see DebtagsAI)
[2] http://lists.alioth.debian.org/pipermail/debtags-devel/2006-July/001292.html
-- 
GPG key: 1024D/797EBFAB 2000-12-05 Enrico Zini <enrico@debian.org>

Attachment: signature.asc
Description: Digital signature


Reply to: