[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

QIIME 1.9 and rdp-classifier and Colt



Hi Andreas,

I read through the correspondence regarding Colt, and wish you luck.  My
memory is that removing the interface files was easy but the stuff in
bin/ had no obvious way to replace it.  The Java is all pretty legible
to me but the actual semantics of what the code is expected to do (and how 
to test if any modification still works) is not.

Anyway, I want to tell you that the good news is I finally got QIIME 1.9
finished and out on Bio-Linux, and as I promised in the meeting it can
now run all major functions without calling out to non-free software. :-)

The bad news is that getting it to work was a major PITA, and even with
the work I've already done you have a fair job getting it into Debian.
There are many new packages and some of those are super-crufty.  The linchpin
package is a thing called "python-burrito-fillings" which comes with a
load of tests but many of them fail.  In one case the test fails when
the Lauchpad builder builds it but succeeds on my own box under pbuilder
and I'm not sure why.  I've got it working to the point of putting it in
Bio-Linux but it needs more love to be ready for Debian.

So even though I published QIIME into Bio-Linux on the 9th of this month
I spent so long sorting it that I got behind on my other work.  If I
push the packages into SVN in the current state and you start working on
them then you are going to be asking me a whole load of questions I
don't have time to answer.  Therefore I'm planning to do this in two
weeks, after my current project launch deadline.

Regarding rdp_classifier I'd say yes rename the script to
rdp-classifier.  I named it to match the JAR name (which should not
change) but I doubt anyone is relying on this in their scripts - things
like QIIME just invoke the JAR directly.

Here's something to consider with RDP classifier - they provide the
databases in both "compiled" and "uncompiled" formats.  This isn't
actually compiling, but is analogous - in this case "compilation"
involves running the classifier in training mode to build a database.
So we'd normally "compile" the datasets as part of building the package,
or else leave them out so the user can compile them.  But
training/compiling needs about 40GB of RAM (much more than actually
running the classifier) so we can't actually deal with the databases on
the build systems even if we wanted to.  And users with less than 40GB
RAM will want to fetch the pre-compiled databases in any case.

On Bio-Linux I've provided one pre-made dataset (the smaller bacterial
one that most people want and that came bundled with the older package)
as part of the DEB, and I've provided a utility script that helps to
fetch the rest.  It's a pragmatic compromise but is not in tune with
Debian policy.  I'm not sure what you'd rather do in this case.

In terms of testing, the tests bundled in python-burrito-fillings
mentioned above actually provide a good way to check rdp_classifier.  My
current package version 2.10.1-0biolinux2 passes them.

TIM

On Tue, 2015-03-10 at 18:23 +0100, Andreas Tille wrote:
> Hi Emmanuel,
> 
> On Tue, Mar 10, 2015 at 06:07:49PM +0100, Emmanuel Bourg wrote:
> > Le 07/03/2015 11:02, Andreas Tille a écrit :
> > 
> > > I also fetched the files from freehep jaida with the same names and
> > > moved them into place.
> > 
> > I didn't go that far and I can't provide more instructions past this
> > point, sorry. I just wanted to demonstrate that colt was still usable
> > without the the non-free src/hep/aida/bin files. But I haven't adapted
> > colt to use the latest version of jaida.
> 
> So I obviously did not understand the description of your experiment.
> Would you mind sending me the source package of your experiment to
> enable me understanding / reproducing what you really did.  I could
> continue with testing the reverse dependencies.  I simply want to
> push this forward even with my limited skills.
> 
> Kind regards
> 
>    Andreas.
> 
> -- 
> http://fam-tille.de
> 
> 

-- 
Tim Booth <tbooth@ceh.ac.uk>
NERC Environmental Bioinformatics Centre 

Centre for Ecology and Hydrology
Maclean Bldg, Benson Lane
Crowmarsh Gifford
Wallingford, England
OX10 8BB 

http://environmentalomics.org/bio-linux
+44 1491 69 2297


-- 
Tim Booth <tbooth@ceh.ac.uk>
NERC Environmental Bioinformatics Centre 

Centre for Ecology and Hydrology
Maclean Bldg, Benson Lane
Crowmarsh Gifford
Wallingford, England
OX10 8BB 

http://environmentalomics.org/bio-linux
+44 1491 69 2297


Reply to: