[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Giving axi-cache biology a try



On Sat, Aug 07, 2010 at 07:54:21PM +0200, Andreas Tille wrote:

> when sumarising some ideas of the Debian Science BOF at DebConf[1]
> I gave
> 
>    axi-cache search biology
> 
> a try.  I admit it was my first try and I do not know on what data basis
> axi-cache is working (at first I tried the DebTags based ept-cache which
> told me that axi-cache is a better replacement) but I was not really
> impressed by the results.  I'm writing this mail to find means to enhance
> the results and I hope Enrico might be able to enlighten us.  Here is my
> try:

[...]

I thought the results weren't too bad? You're vague ("biology" is vague,
considering how many different biology packages we have in Debian), and
you get two metapackages at the top of the results, with high
relevance ratings:

> Results 1-20:
> 100% science-biology - Debian Science Biology packages
> 94% med-bio - Debian Med micro-biology packages
>      --> How to get 100% ?
> 86% libvibrant6-dev - NCBI libraries for graphic biology applications (development files)
> 86% med-bio-dev - Debian Med packages for development of micro-biology applications
> 83% jemboss - graphical user interface to EMBOSS
> 80% ncbi-tools-x11 - NCBI libraries for biology applications (X-based utilities)
>      --> I wonder by what means these packages get a "quite high" percentage
>          while other similar relevant packages are much lower.

Rather than running "axi-cache more" to exaustion, you can try following
axi-cache's advice (also available on bash tab completion, before you
even run the search) for ways to improve your query. I thought the
suggestions weren't too bad:

> More terms: molecular emboss ncbi european software sequence vibrant
> More tags: field::biology field::biology:bioinformatics use::searching use::viewing uitoolkit::motif field::biology:molecular suite::debian

I'm not able to say how to get med-bio to have the percentage you want:
xapian's way to compute relevance could be rather complicated. But why
bother, really, as long as it's on top of the list? Maybe axi-cache
just shouldn't show percentages, like Google does.

A more interesting thing you may want to do is to query for specific use
cases. "dna sequencing"? "dicom viewer"? Pick queries you know users are
actually using, if you can. Then see if the results are what you expect.
This is a work that would check if the descriptions and the tags of your
packages actually help the user find it.


Ciao,

Enrico

-- 
GPG key: 4096R/E7AD5568 2009-05-08 Enrico Zini <enrico@enricozini.org>

Attachment: signature.asc
Description: Digital signature


Reply to: