Re: QIIME 1.9 and rdp-classifier and Colt

To: Tim Booth <tbooth@ceh.ac.uk>
Cc: Debian Med Project List <debian-med@lists.debian.org>
Subject: Re: QIIME 1.9 and rdp-classifier and Colt
From: Andreas Tille <andreas@an3as.eu>
Date: Mon, 23 Mar 2015 14:50:40 +0100
Message-id: <[🔎] 20150323135040.GF16995@an3as.eu>
In-reply-to: <[🔎] 1426874145.3054.39.camel@wllt1771.nerc-wallingford.ac.uk>
References: <20150222084753.GG28299@an3as.eu> <54EA0DD3.5000605@gambaru.de> <20150222174907.GB20206@an3as.eu> <mcd677$545$1@ger.gmane.org> <20150222210635.GD20206@an3as.eu> <54EA4AA1.4010500@apache.org> <[🔎] 20150307100215.GC14122@an3as.eu> <[🔎] 54FF24E5.8010506@apache.org> <[🔎] 20150310172334.GA28638@an3as.eu> <[🔎] 1426874145.3054.39.camel@wllt1771.nerc-wallingford.ac.uk>

Hi Tim,

On Fri, Mar 20, 2015 at 05:55:45PM +0000, Tim Booth wrote:
> I read through the correspondence regarding Colt, and wish you luck.  My
> memory is that removing the interface files was easy but the stuff in
> bin/ had no obvious way to replace it.  The Java is all pretty legible
> to me but the actual semantics of what the code is expected to do (and how 
> to test if any modification still works) is not.

I admit I'm quite optimistic.  We'll see if I'm correct once I'll come
back to your packages with libcolt as predependency. :-)

> Anyway, I want to tell you that the good news is I finally got QIIME 1.9
> finished and out on Bio-Linux, and as I promised in the meeting it can
> now run all major functions without calling out to non-free software. :-)

+1

> The bad news is that getting it to work was a major PITA, and even with
> the work I've already done you have a fair job getting it into Debian.

Since the next release cycle has not even started I'm quite optimistic
that we will finish it until Jessie+1.

> There are many new packages and some of those are super-crufty.  The linchpin
> package is a thing called "python-burrito-fillings" which comes with a
> load of tests but many of them fail.  In one case the test fails when
> the Lauchpad builder builds it but succeeds on my own box under pbuilder
> and I'm not sure why.  I've got it working to the point of putting it in
> Bio-Linux but it needs more love to be ready for Debian.

Lets see.  I have observed a similar thing with python-cogent.  It has
tests that fail to run in pbuilder.  I tried to bisect the problematic
tests but the removal of test A triggered the failure of test B which
was fine before.  Very strange, very nasty - just hidden since the
former packaging did not run the test suite.  Since I detected this
short before the freeze I decided to keep it as is - which worked for
two releases without any user bug report.  I do not like it and will
keep on working it - but it also needs time and perhaps we will be more
lucky later.  Apropos testing:  For next year I plan to register a GSoC
project to try to get a test suite for *all* Debian Med packages.
Sounds like a real challenge ...

> So even though I published QIIME into Bio-Linux on the 9th of this month
> I spent so long sorting it that I got behind on my other work.  If I
> push the packages into SVN in the current state and you start working on
> them then you are going to be asking me a whole load of questions I
> don't have time to answer.  Therefore I'm planning to do this in two
> weeks, after my current project launch deadline.

Fine for me.  I'm sure I can spend my time until then on one or two
other tasks. ;-)

> Regarding rdp_classifier I'd say yes rename the script to
> rdp-classifier.  I named it to match the JAR name (which should not
> change) but I doubt anyone is relying on this in their scripts - things
> like QIIME just invoke the JAR directly.

OK.

> Here's something to consider with RDP classifier - they provide the
> databases in both "compiled" and "uncompiled" formats.  This isn't
> actually compiling, but is analogous - in this case "compilation"
> involves running the classifier in training mode to build a database.
> So we'd normally "compile" the datasets as part of building the package,
> or else leave them out so the user can compile them.  But
> training/compiling needs about 40GB of RAM (much more than actually
> running the classifier) so we can't actually deal with the databases on
> the build systems even if we wanted to.  And users with less than 40GB
> RAM will want to fetch the pre-compiled databases in any case.
> 
> On Bio-Linux I've provided one pre-made dataset (the smaller bacterial
> one that most people want and that came bundled with the older package)
> as part of the DEB, and I've provided a utility script that helps to
> fetch the rest.  It's a pragmatic compromise but is not in tune with
> Debian policy.  I'm not sure what you'd rather do in this case.

Ahhh, now I understand why the download is *way* smaller than
previously.  I'm not fully sure if I understand the problem correctly so
please take my answers with a grain of salt.  But for the moment I see
two options:

   1) Compile the database in postinst (perhaps after checking the
      available amount of memory)
      This would in turn mean we can only suggest rdp-classifier in
      med-bio to not blow any random machine where med-bio will be
      installed.
   2) Document what to do to compile the database or where to download
      in README.Debian.

> In terms of testing, the tests bundled in python-burrito-fillings
> mentioned above actually provide a good way to check rdp_classifier.  My
> current package version 2.10.1-0biolinux2 passes them.

I admit I really like the fact that we have more and more test suites
that rely on other packages of ours so we have some inter-package cross
checking.  I experienced this the first time with BioPython which
uncovered some problems that were hidden before.

Thanks for your work on all this and please ping me explicitly if you
think there would be some work needed in a certain sequence but you see
no action you would expect.

Kind regards

       Andreas.

-- 
http://fam-tille.de

Reply to:

References:
- Tried to create libcolt-free-java.git (Was: Please help freeing libcolt-java)
  - From: Andreas Tille <andreas@an3as.eu>
- Re: Tried to create libcolt-free-java.git (Was: Please help freeing libcolt-java)
  - From: Emmanuel Bourg <ebourg@apache.org>
- Re: Tried to create libcolt-free-java.git (Was: Please help freeing libcolt-java)
  - From: Andreas Tille <andreas@an3as.eu>
- QIIME 1.9 and rdp-classifier and Colt
  - From: Tim Booth <tbooth@ceh.ac.uk>

Prev by Date: Re: EDFlib 1.11 (new version available)
Next by Date: SVN -> GIT mass conversion
Previous by thread: QIIME 1.9 and rdp-classifier and Colt
Next by thread: DWV (DICOM Web Viewer)
Index(es):
- Date
- Thread