Giving axi-cache biology a try
Hi,
when sumarising some ideas of the Debian Science BOF at DebConf[1]
I gave
axi-cache search biology
a try. I admit it was my first try and I do not know on what data basis
axi-cache is working (at first I tried the DebTags based ept-cache which
told me that axi-cache is a better replacement) but I was not really
impressed by the results. I'm writing this mail to find means to enhance
the results and I hope Enrico might be able to enlighten us. Here is my
try:
d.
Results 1-20:
100% science-biology - Debian Science Biology packages
94% med-bio - Debian Med micro-biology packages
--> How to get 100% ?
86% libvibrant6-dev - NCBI libraries for graphic biology applications (development files)
86% med-bio-dev - Debian Med packages for development of micro-biology applications
83% jemboss - graphical user interface to EMBOSS
80% ncbi-tools-x11 - NCBI libraries for biology applications (X-based utilities)
--> I wonder by what means these packages get a "quite high" percentage
while other similar relevant packages are much lower.
77% avida-base - Auto-adaptive genetic system for Artificial Life research
75% libbiojava1.7-java - Java API to biological data and applications
75% libbiojava-java - Java API to biological data and applications
73% r-other-bio3d - GNU R package for biological structure analysis
71% genesis - general-purpose neural simulator
--> just added avida-base and genesis to science-biology, they were
missing there and thanks to axi-cache I detected this, IMHO they
are not really for med-bio (anybody disagrees??)
71% emboss - the european molecular biology open software suite
67% emboss-data - data files for the EMBOSS package
--> Hmmm, emboss Depends:emboss-data,emboss-lib. emboss-data is
according to axi-cache less relevant and emboss-lib even less
or not at all relevant. Does this make sense??
67% emboss-doc - documentation for EMBOSS
67% libbiojava-java-doc - [Biology] Documentation for BioJava
67% med-typesetting - Debian Med support for typesetting and publishing
--> not really relevant for Biology - there are only some styles
which come handy, but I'd call the percentage too high here
66% ncoils - [Biology] coiled coil secondary structure prediction
66% libncbi6-dev - NCBI libraries for biology applications (development files)
66% rasmol-doc - Documentation for rasmol
--> Here the doc package is more relevant than the binary package,
in other cases this was the other way around
66% readseq - [Biology] Conversion between sequence formats
--> [OT] Sideremark: we really have one package with [Biology] in short
description :-( - changes in SVN
More terms: molecular emboss ncbi european software sequence vibrant
More tags: field::biology field::biology:bioinformatics use::searching use::viewing uitoolkit::motif field::biology:molecular suite::debian
`axi-cache more' will give more results
$ axi-cache more
101 results found.
Results 21-40:
65% libvibrant6a - NCBI libraries for graphic biology applications
65% emboss-test - test files for the EMBOSS package
64% libncbi6 - NCBI libraries for biology applications
64% libball1.3-doc - Documentation for the BALL library
63% med-config - Debian Med Project config package
--> Hmmm, that's not *really* relevant for Biology - just
a common helper for all med-* packages but makes no sense
here.
63% python-ball - Python bindings for the Biochemical Algorithms Library
62% python-ballview - Python bindings for VIEW-parts of the Biochemical Algorithms Library
62% libnucleus6 - EMBOSS library for molecular sequence analysis
62% bioperl - Perl tools for computational molecular biology
61% ncbi-tools-bin - NCBI libraries for biology applications (text-based utilities)
61% libballview1.3-dev - Header files for the VIEW part of the Biochemical Algorithms Library
60% mcl - the Markov Cluster algorithm
59% libncbi6-dbg - NCBI libraries for biology applications (debugging symbols)
59% paw-common - Physics Analysis Workstation (common files)
--> Could anybody check this whether PAW is also something for us?
I just thought it would be rather Physics only - but the description
mentiones Biology
59% phylip - [Biology] A package of programs for inferring phylogenies
--> Hmmm, the latest Phylip version is hanging in unstable for nearly one
year because it is not builded for armel, powerpc - this sucks. :-(
59% montecarlo-base - [Physics] Common files for CERNLIB Monte Carlo libraries
--> same as PAW, is this something for us?
59% biosquid - utilities for biological sequence analysis
59% libvibrant6a-dbg - NCBI libraries for graphic biology applications (unstripped)
58% libbio-ruby1.8 - bioruby tools for computational molecular biology
58% biosquid-dev - headers and static library for biological sequence analysis
`axi-cache more' will give more results
`axi-cache again' will restart the search
$ axi-cache more
101 results found.
Results 41-60:
58% kalign - Global and progressive multiple sequence alignment
58% phylip-doc - [Biology] A package of programs for inferring phylogenies
57% hmmer - profile hidden Markov models for protein sequence analysis
57% paw-demos - Physics Analysis Workstation examples and tests
57% cernlib-base - CERNLIB data analysis suite - common files
57% cernlib - CERNLIB data analysis suite - general use metapackage
--> same as PAW and montecarlo-base, should we watch these somehow,
is anybody using this, just adding it to science-biology
might be an option.
57% seqan-dev - A C++ template library for the analysis of sequences
56% libkernlib1-dev - CERNLIB data analysis suite - core library of basic functions (development)
56% libkernlib1-gfortran - CERNLIB data analysis suite - core library of basic functions
56% libswiss-perl - Perl API to the UniProt database
56% science-config - Debian Science Project config package
--> same falso positive as med-config
56% libajax6 - EMBOSS library for commands
56% dzedit - CERNLIB data analysis suite - ZEBRA documentation editor
--> see cernlib
56% seqan-apps - A C++ template library for the analysis of sequences
55% hmmer-doc - profile hidden Markov models for protein sequence analysis (docs)
55% libpawlib-lesstif3-dev - CERNLIB PAW library (Lesstif-dependent part - development files)
55% libbio-ruby - bioruby tools for computational molecular biology
55% hmmer-pvm - HMMER programs with PVM (Parallel Virtual Machine) support
55% libgrafx11-1-dev - CERNLIB data analysis suite - interface to X11 and PostScript (development)
54% cernlib-core-dev - CERNLIB data analysis suite - core development files
`axi-cache more' will give more results
`axi-cache again' will restart the search
$ axi-cache more
101 results found.
Results 61-80:
54% python-biopython - Python library for bioinformatics
--> libbiojava has 75% - python-biopython 54% - WHY???
54% libgrafx11-1-gfortran - CERNLIB data analysis suite - interface to X11 and PostScript
--> I'm just leaving out all the other CERNLIB packages here ...
...
52% libballview1.3 - Biochemical Algorithms Library, VIEW framework
...
51% libball1.3 - Biochemical Algorithms Library
50% gromacs-dev - GROMACS molecular dynamics sim, development kit
...
50% gromacs - Molecular dynamics simulator, with building and analysis tools
50% gromacs-openmpi - Molecular dynamics sim, binaries for OpenMPI parallelization
...
`axi-cache more' will give more results
`axi-cache again' will restart the search
$ axi-cache more
101 results found.
Results 81-100:
50% gromacs-mpich - Molecular dynamics sim, binaries for MPICH parallelization
50% biococoa.app - biological sequence file format conversion applet for GNUstep
50% paw++ - Physics Analysis Workstation (Lesstif-enhanced version)
...
48% last-align - genome-scale comparison of biological sequences
...
47% dialign-tx - Segment-based multiple sequence alignment
...
44% ballview - A free molecular modeling and molecular graphics tool
44% glam2 - gapped protein motifs from unaligned sequences
...
41% rasmol - Visualize biological macromolecules
--> how comes a quite relevant package that low in the ranking?
...
40% wims - server for educative contents as courses, exercises, exams
--> quite a general educational package which *also* contains
biology - but way less relevant than for instance the next
package
35% maq - maps short fixed-length polymorphic DNA sequence reads to reference sequences
`axi-cache more' will give more results
`axi-cache again' will restart the search
[xi-cache more
101 results found.
Results 101-101:
29% texlive-science - TeX Live: Typesetting for natural and computer sciences
`axi-cache again' will restart the search
The query is definitely missing about 20-30 packages (estimated) because
med-bio and med-bio-dev together contain about 100 packages.
Considering several false positives above this makes quite a difference.
So what can we do to enhance this situation?
Kind regards
Andreas.
1] http://wiki.debian.org/DebianScience/ProblemsToWorkOn
(not finished at the time of writing)
--
http://fam-tille.de
Reply to: