[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: SPAdes for Debian



Hi Anton,

thanks for your quick reply.

On Mon, Feb 17, 2014 at 07:12:40PM +0400, Anton Korobeynikov wrote:
> 
> > I now wanted to give SPAdes a try and did the packaging in SVN[2].  The
> > most important thing you might like to know is that I followed the
> > Debian philosophy to use the just packaged code and thus I skipped the
> > code copies of boos, python-yaml and python-joblib.
> This may be wrong. Unfortunately, pyyaml and joblib are not
> Python3-friendly. So we had to patch them to make sure one can run
> spades.py under Python 3. We're packaging these patched versions along
> with SPAdes and select the proper variant (Python 2 or Python3) on
> fly. Have Debian patched these packaged to be able to run on both
> Python 2 and Python 3? Have you checked SPAdes functionality with
> Python 3? spades.py itself is Python 3 friendly.

Well, in Debian the according modules are selected automatically
depending from the Python version.  So if you run python3 the Python
modules path is automatically set to the Python 3 modules (which are in
the packages python3-yaml and python3-joblib.  If you confirm that
SPAdes works "better" with Python 3 I will adapt the according script in
/usr/bin/spades to force python3 as interpreter.  (The patch will remain
the same.)
 
> > You also might like to notice that Debian always delivers an executable under /usr/bin which
> > is in this case named spades and dispades consisting of a wrapper to
> > spades.py which finds the main spades binary under /usr/lib/spades/bin
> This is pretty bad. Because it definitely creates the command line
> difference wrt the released version. We would strongly suggest to keep
> spades.py and dipspades.py as the main runner scripts. Otherwise we
> expect that the user experience from SPAdes may be broken:
> 
> 1. Debian user who used to run just "spades" will run bare SPAdes
> binary on other systems with (sometimes) cryptic error messages about
> missed config, etc.
> 2. non-Debian user who used to run "spades.py" will have hard time to
> find the proper way to run SPAdes on Debian. This way is not
> documented in the official SPAdes documentation.

This is a never lasting discussion on our lists.  In Debian language
extensions are disapproved by policy and there are good reasons for
this.  I know that there is an ever lasting discussion specifically also
in our Debian Med team since the very reasons you are mentioning above:
Upstream authors of scientific software seem to to like expressing in
what software a script is written for whatever reason.  I would be happy
if you would take your time to read one example of this discussion where
I tried to collect links to very good reason to not use these language
extensions:

   https://lists.debian.org/debian-med/2012/04/msg00103.html

For sure I do not intend to create an incompatibility with any
non-Debian instances of SPAdes and it is definitely not in my interest
to create packages you as developers do not like.  However, specifically
in the case of spades.py I feel confused as a user since this seems to
be basically a wrapper around "spades.c" code.  Could you please
elaborate about the reasons why you are "advertising" the script
language in a way which may be needs to reverted in case at some time in
time you consider rewriting the wrapper into Ruby, Haskell or whatever
cool language (not that I personally would consider this sensible, just
finding some examples).
 
> > but these are technical details.
> I looked over the patches and it seems like all of them are invalid.
> Let me provide the feedback:
> 
> 1. adapt_to_debian_pathes.patch
> spades.py supports two directory layouts:
>   - the pretty standard /usr/bin + /usr/share layout
>   - all-in-one used mainly for development (using src, etc.)
> 
> Basically you removed the developer's way of running SPAdes which is
> pretty useless. If you really need this - then maybe you forgot to do
> "make install" ?

The debian/rules file was calling `make install` automatically but in
this case (if I remember correctly) a lot of files went to /usr/bin but
the user should actually only call spades.py and dipspades.py.  I have
seen no point in providing (ion)hammer, bwa-spades and even spades (the
compiled binary) in /usr/bin/ since the user should not really touch
these (if I'm correct - please correct me if I'm wrong).  So after the
installation the files location were rearranged to enable a file system
layout which at least to my possibly naive look at things seems more
clean from a users perspective.

And yes, I did not regarded the developer's way of running SPAdes.  The
Debian package (for the moment) supports only the users who do not
intend to rebuild from code.  Do you think it would make sense to
support a development library?

> 2. do_not_install_third_party_libs.patch
> See above about Python 2 vs Python 3. Please make sure that you can
> run SPAdes with all supported Python versions (2.4, 2.5, 2.6, 2.7, 3.2
> and 3.3).

On Debian Systems the default Python interpreter is currently Python
2.7.  If you want to use Python 3 you need to call python3 (and you get
Python 3.3 (for the moment).  IMHO the advantage of the wrapper I was
choosing is that we will be able to force python3 via this wrapper if
you would prefer this.  Can you tell me any reason for a user who simply
wants to do genome assemblies to pick from a set of Python interpreters?
 
> 3. use_debian-packaged_boost.patch
> No need to patch. Just run cmake with -DSPADES_BOOST_ROOT=/usr/include

Yep, I confirm the patch was a quick-n-(to)-dirty approach.  I will drop
this.
 
> 4. use_debian_packaged_python-yaml.patch
> See above
> 
> So, it looks like that at least 2 of patches are useless and may be
> simply removed. For other two ones you need to make sure you have not
> broken Python 3 support :)

As I said:  Python 3 support is definitely not brocken:

$ python
Python 2.7.6 (default, Jan 11 2014, 14:34:26) 
[GCC 4.8.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import yaml as pyyaml
>>> 

$ python3
Python 3.3.4rc1 (default, Jan 27 2014, 11:28:31) 
[GCC 4.8.2] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import yaml as pyyaml
>>> 

Just the renamed versions were not working on a Debian system properly
if you want to use the packaged versions instead of the code copies
(which usually recieve a veto from our security team).

> > The problem I have is that I have access to some other computer on which
> > a copy of SPAdes 2.5 is installed (I assume from the prebuilded binary
> > but I can not say for sure).  When running the very same test I get
> > different *.fasta/*.fastq results.  What I'm wondering now:  What is the
> > reason for the differences in the test between version 2.5 and 3.0.  I
> > think it would make sense to provide some kind of test suite to ensure
> > users that their results have good chances to be reproducible.
> SPAdes 3.0 contains many improvements, so the difference in the output
> is expected. SPAdes does include the testsuite, however it's not
> pretty-much user-friendly.

Does this mean there is a test suite included I just not detected?  The
point is that it does not need to be user friendly because we try to
run it on behalf of our users

  a) at package build time (if possible without network access)
  b) in a persiodically scheduled cron job over the whole Debian
     archive

So if this test suite is scriptable I would be happy to learn how I
could do this.

> Also, please make sure that you specified all the necessary
> dependencies (gcc 4.7, cmake 2.8.8).

That's granted inside the target release.

$ gcc --version
gcc (Debian 4.8.2-14) 4.8.2
$ cmake --version
cmake version 2.8.12.1

For the sake of backports to (very) old Debian systems this might be a
problem but it is not even for the latest released version (Debian
"Wheezy" 7.x).

> Below is the list of additional components with different licenses
> which are missed from 'copyright' file (actually, everything from ext
> dir):
> 
> ext/include/btree - Apache 2.0 license
> ext/src/jemalloc - BSDlike license. Note that this is a *modified*
> version of the jemalloc
> ext/src/yaml-cpp - MIT license. I believe this version is unmodified,
> but I do not recall offhand
> ext/include/kseq - MIT license
> ext/include/libcxx - MIT license

Thanks for these hints.  I did not yet finished the debian/copyright
file but this list actually helps.

> ext/tools/bwa - GPLv3. Slightly modified to make sure it compiles with
> modern GCC's.

My plan is to replace this by the Debian packaged version:

$ apt-cache show bwa | grep ^Version
Version: 0.7.5a-2

which definitely compiles with latest gcc (and we actually provided
patches to upstream in the past to let this happen).  Do you think
it is a bad idea to rather use bwa 0.7.5a than the code copy you are
providing inside SPAdes?

> If you have any questions or suggestions - feel free to ask and I will
> be happy to answer and provide help, if needed.

As I said I'm very happy about any comments of yours (depite potentially
contradicting opinions about language extensions).  I would really like
to get some result out in Debian you as upstream developers are happy
about.  Currently it is work in progress and any input of yours is
really helpful.

Kind regards

       Andreas.

-- 
http://fam-tille.de


Reply to: