[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Why apt-get is not a proper software search engine (was Re: And now for something completely different... etch!)



On Tuesday 07 June 2005 23:44, Javier Fernández-Sanguino Peña wrote:

> Debtags might not cut it either, but might be an improvement over a free
> keyword search which ends up turing the wron packages just because they
> have the word used in the query. A good search function could:
>
> - use keywords/tags (using boolean logic or even regular expressions)
> - use package sections and priorities to adjust results (few users look
> directly for 'libs' or 'oldlibs')
> - use package dependancies to ponder if this is an end-user package or
> something pulled in by other packages (users typically look for end-user
> programs)
> - use popcon to priorise results (users typically look for programs many
> others use)
> - i18n/l10n search, through translation of package descriptions (so that
> searching in != english is possible)

You may try ara [1].  It can perform sophisticated queries on 
the /var/lib/dpkg/available database. You can use atomic expressions which can 
be of the forms pattern, /regexp/, quoted_string, fieldspec operator1 string, 
or fieldspec operator2 regexp. It is possible to use any field of the package 
database as a search criterion and any boolean combination thereof . You can 
perform really simple and really complex queries interactively and within its 
own shell-like environment (history supported) with configurable output 
format. Since the whole database may or may not be loaded into memory it can 
trade memory usage for speed and reverse, also it has option to try to 
compact the heap. It even has its own lightweight httpd, which is still 
experimental and not activated in 1.0.9 release, but you can try it at 
http://ara.zapto.org. It has many options, its own config, and can call 
external progs to grok things as user defined them (apt-get update/install in 
a xterm, a2ps... ). It's written in ocaml, well documented and it it in sarge 
for a long time ;-) [2].

Here are some self-explaned syntax examples:
A simple one:
[ara shell here]& depends:(kde or gnome or x11 or qt) & section:graphics

Regular expressions:
More complex regular expressions need to be enclosed between slashes /.  The 
syntax is sed-ish, the second slash can be followed by i for 
case-insensitivity and w for word-boundary enforcing.  (Remark: digits count 
as word boundaries).  The regular expression syntax is that of Ocaml's Str 
module, which is more or less standard.  Example :

[ara shell here]& /[tpn]etris/iw & depends:/libqt.*/w
[results] cuyo gnome-games kfouleggs ksirtet ksmiletris

Variables:
It is possible to put the result of a query into a named variable and use that 
variable afterwards.  This is accomplished by including an assignment in the 
query, such as :

[ara shell here]& $gui := depends:(gtk | qt | kde | gnome | xlibs)
[results] ....
After execution a variable named GUI will appear in the variable list.  It may 
then be referred as $gui.

A 'guess what' non-interactive, nice formating example ;-)
(from the man page)
bash$ ara -old -fields Package:8,Size,Description:100 \
 -table 'Section=games and not (Depends:(gtk|sdl|kde|opengl|gnome|qt)
 or /shoot\|kill\|destroy\|blast\|race\|bomb/iw
 or /multi\(-\|\)player\|strategy\|conquest\|3\(-\|\)d/iw)
 and Depends:(xlibs or vga)
 and Size <= 1000000'

> an improvement over the 1st thing (keyword search) would be the use of
> "intelligent" text analysis tools (bayesian analysis, N-grams, TFIDF and
> the like). For an example implementation of this take a look at
> remembrance-agent (which uses the 'bag of words' library: bow)

Such analysis and popcon results are not supported.

[1] Packages are: ara, xara, ara-byte, xara-gtk-byte.
note: this's a complete rewrite of ara package found in oldstable woody, 
featuring CLI and GTK2 bytecode and native versions for most arches (native is 
faster of course). Thanks to upstream author and debian-ocaml-maint@ hackers 
(author is subscribed there) for helping me to package and sponsor that 
beast.
[2] perhaps it is not perfect, but works.

-- 
pub 4096R/0E4BD0AB 2003-03-18 <danchev.fccf.net/key pgp.mit.edu>
fingerprint    1AE7 7C66 0A26 5BFF DF22 5D55 1C57 0C89 0E4B D0AB 



Reply to: