[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Giving axi-cache biology a try



Hi,

when sumarising some ideas of the Debian Science BOF at DebConf[1]
I gave

   axi-cache search biology

a try.  I admit it was my first try and I do not know on what data basis
axi-cache is working (at first I tried the DebTags based ept-cache which
told me that axi-cache is a better replacement) but I was not really
impressed by the results.  I'm writing this mail to find means to enhance
the results and I hope Enrico might be able to enlighten us.  Here is my
try:

d.
Results 1-20:
100% science-biology - Debian Science Biology packages
94% med-bio - Debian Med micro-biology packages
     --> How to get 100% ?
86% libvibrant6-dev - NCBI libraries for graphic biology applications (development files)
86% med-bio-dev - Debian Med packages for development of micro-biology applications
83% jemboss - graphical user interface to EMBOSS
80% ncbi-tools-x11 - NCBI libraries for biology applications (X-based utilities)
     --> I wonder by what means these packages get a "quite high" percentage
         while other similar relevant packages are much lower.
77% avida-base - Auto-adaptive genetic system for Artificial Life research
75% libbiojava1.7-java - Java API to biological data and applications
75% libbiojava-java - Java API to biological data and applications
73% r-other-bio3d - GNU R package for biological structure analysis
71% genesis - general-purpose neural simulator
     --> just added avida-base and genesis to science-biology, they were
         missing there and thanks to axi-cache I detected this, IMHO they
         are not really for med-bio (anybody disagrees??)
71% emboss - the european molecular biology open software suite
67% emboss-data - data files for the EMBOSS package
     --> Hmmm, emboss Depends:emboss-data,emboss-lib.  emboss-data is
         according to axi-cache less relevant and emboss-lib even less
         or not at all relevant.  Does this make sense??
67% emboss-doc - documentation for EMBOSS
67% libbiojava-java-doc - [Biology] Documentation for BioJava
67% med-typesetting - Debian Med support for typesetting and publishing
     --> not really relevant for Biology - there are only some styles
         which come handy, but I'd call the percentage too high here
66% ncoils - [Biology] coiled coil secondary structure prediction
66% libncbi6-dev - NCBI libraries for biology applications (development files)
66% rasmol-doc - Documentation for rasmol
     --> Here the doc package is more relevant than the binary package,
         in other cases this was the other way around
66% readseq - [Biology] Conversion between sequence formats
     --> [OT] Sideremark: we really have one package with [Biology] in short
         description :-( - changes in SVN
More terms: molecular emboss ncbi european software sequence vibrant
More tags: field::biology field::biology:bioinformatics use::searching use::viewing uitoolkit::motif field::biology:molecular suite::debian
`axi-cache more' will give more results

$ axi-cache more
101 results found.
Results 21-40:
65% libvibrant6a - NCBI libraries for graphic biology applications
65% emboss-test - test files for the EMBOSS package
64% libncbi6 - NCBI libraries for biology applications
64% libball1.3-doc - Documentation for the BALL library
63% med-config - Debian Med Project config package
     --> Hmmm, that's not *really* relevant for Biology - just
         a common helper for all med-* packages but makes no sense
         here.
63% python-ball - Python bindings for the Biochemical Algorithms Library
62% python-ballview - Python bindings for VIEW-parts of the Biochemical Algorithms Library
62% libnucleus6 - EMBOSS library for molecular sequence analysis
62% bioperl - Perl tools for computational molecular biology
61% ncbi-tools-bin - NCBI libraries for biology applications (text-based utilities)
61% libballview1.3-dev - Header files for the VIEW part of the Biochemical Algorithms Library
60% mcl - the Markov Cluster algorithm
59% libncbi6-dbg - NCBI libraries for biology applications (debugging symbols)
59% paw-common - Physics Analysis Workstation (common files)
     --> Could anybody check this whether PAW is also something for us?
         I just thought it would be rather Physics only - but the description
         mentiones Biology
59% phylip - [Biology] A package of programs for inferring phylogenies
     --> Hmmm, the latest Phylip version is hanging in unstable for nearly one
         year because it is not builded for armel, powerpc - this sucks. :-(
59% montecarlo-base - [Physics] Common files for CERNLIB Monte Carlo libraries
     --> same as PAW, is this something for us?
59% biosquid - utilities for biological sequence analysis
59% libvibrant6a-dbg - NCBI libraries for graphic biology applications (unstripped)
58% libbio-ruby1.8 - bioruby tools for computational molecular biology
58% biosquid-dev - headers and static library for biological sequence analysis
`axi-cache more' will give more results
`axi-cache again' will restart the search

$ axi-cache more
101 results found.
Results 41-60:
58% kalign - Global and progressive multiple sequence alignment
58% phylip-doc - [Biology] A package of programs for inferring phylogenies
57% hmmer - profile hidden Markov models for protein sequence analysis
57% paw-demos - Physics Analysis Workstation examples and tests
57% cernlib-base - CERNLIB data analysis suite - common files
57% cernlib - CERNLIB data analysis suite - general use metapackage
     --> same as PAW and montecarlo-base, should we watch these somehow,
         is anybody using this, just adding it to science-biology
         might be an option.
57% seqan-dev - A C++ template library for the analysis of sequences
56% libkernlib1-dev - CERNLIB data analysis suite - core library of basic functions (development)
56% libkernlib1-gfortran - CERNLIB data analysis suite - core library of basic functions
56% libswiss-perl - Perl API to the UniProt database
56% science-config - Debian Science Project config package
     --> same falso positive as med-config
56% libajax6 - EMBOSS library for commands
56% dzedit - CERNLIB data analysis suite - ZEBRA documentation editor
     --> see cernlib
56% seqan-apps - A C++ template library for the analysis of sequences
55% hmmer-doc - profile hidden Markov models for protein sequence analysis (docs)
55% libpawlib-lesstif3-dev - CERNLIB PAW library (Lesstif-dependent part - development files)
55% libbio-ruby - bioruby tools for computational molecular biology
55% hmmer-pvm - HMMER programs with PVM (Parallel Virtual Machine) support
55% libgrafx11-1-dev - CERNLIB data analysis suite - interface to X11 and PostScript (development)
54% cernlib-core-dev - CERNLIB data analysis suite - core development files
`axi-cache more' will give more results
`axi-cache again' will restart the search

$ axi-cache more
101 results found.
Results 61-80:
54% python-biopython - Python library for bioinformatics
     --> libbiojava has 75% - python-biopython 54% - WHY???
54% libgrafx11-1-gfortran - CERNLIB data analysis suite - interface to X11 and PostScript
     --> I'm just leaving out all the other CERNLIB packages here ...
...
52% libballview1.3 - Biochemical Algorithms Library, VIEW framework
...
51% libball1.3 - Biochemical Algorithms Library
50% gromacs-dev - GROMACS molecular dynamics sim, development kit
...
50% gromacs - Molecular dynamics simulator, with building and analysis tools
50% gromacs-openmpi - Molecular dynamics sim, binaries for OpenMPI parallelization
...
`axi-cache more' will give more results
`axi-cache again' will restart the search


$ axi-cache more
101 results found.
Results 81-100:
50% gromacs-mpich - Molecular dynamics sim, binaries for MPICH parallelization
50% biococoa.app - biological sequence file format conversion applet for GNUstep
50% paw++ - Physics Analysis Workstation (Lesstif-enhanced version)
...
48% last-align - genome-scale comparison of biological sequences
...
47% dialign-tx - Segment-based multiple sequence alignment
...
44% ballview - A free molecular modeling and molecular graphics tool
44% glam2 - gapped protein motifs from unaligned sequences
...
41% rasmol - Visualize biological macromolecules
     --> how comes a quite relevant package that low in the ranking?
...
40% wims - server for educative contents as courses, exercises, exams
     --> quite a general educational package which *also* contains
         biology - but way less relevant than for instance the next
         package
35% maq - maps short fixed-length polymorphic DNA sequence reads to reference sequences
`axi-cache more' will give more results
`axi-cache again' will restart the search


[xi-cache more
101 results found.
Results 101-101:
29% texlive-science - TeX Live: Typesetting for natural and computer sciences
`axi-cache again' will restart the search


The query is definitely missing about 20-30 packages (estimated) because
med-bio and med-bio-dev together contain about 100 packages.
Considering several false positives above this makes quite a difference.
So what can we do to enhance this situation?

Kind regards

       Andreas.

1] http://wiki.debian.org/DebianScience/ProblemsToWorkOn
    (not finished at the time of writing)
-- 
http://fam-tille.de


Reply to: