[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Announcing Debian Package Tags



On Tue, Apr 29, 2003 at 12:38:18PM +0200, Enrico Zini wrote:
> On Mon, Apr 28, 2003 at 03:23:24PM +0200, Javier Fernández-Sanguino Peña wrote:
> 
> > [1] One of the difficult things in the future might be to generate new tags
> > or associate new packages to tags already available. Automatising (sp?) 
> > this would be useful and TFIDF (and similar IA-related techniques) help
> > with this quite a lot.
> 
> I've never tried the tools you are suggesting me, but I definitely will.
> I have some immediate additions to make to tagcoll and debtags based on
> the many suggestions I have received.
> 

Notice that rainbow (libbow) is not being actively updated upstream 
anymore. The library works, the tool works, but some documentation is still 
lacking.


> You're showing me a whole new world to explore, and I'll be sure do it
> asap.

Glad to help.

Just FYI TFIDF is a very simple "technology" (as a matter of fact it's just
an equation) used to determine the 'weight' of words given a liberal text. 
It's useful for document clustering (because you can determine documents
belong to the same 'group' if they have the same word weights).

The application I found, in Debian, worth testing (which prompted me to 
develop the hack that 'dpkg-iasearch' is) is to use document clustering and 
TFIDF to find packages.

If you have a set of packages descriptions (let's say 4000) you can parse 
all the words in the descriptions, compare all the words in all the 
descriptions and determine which words are 'appropiate' to describe a given 
package. 

As a matter of fact, naturally, this same words are keywords that can 
describe a set of related packages and thus, this approach could be useful 
to automaticly tags new packages when they get into Debian.

Just my 2c.

Regards

Javi

Attachment: pgp6yKP7Ga0ki.pgp
Description: PGP signature


Reply to: