[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Description for lefse tools (Was: Origin of data files in MetaPhLan2)



Hi Andreas,

Thanks for you work. I answer your questions as bellow:

- some small fixes: https://anonscm.debian.org/viewvc/debian-med/trunk/packages/metaphlan2/trunk/debian/patches/fix_sequence.patch?view=markup 
    -> fixed
some spelling issues https://anonscm.debian.org/viewvc/debian-med/trunk/packages/metaphlan2/trunk/debian/patches/spelling.patch?view=markup

- Tin can also provide more info about the binary data in db_v20. The files ending with "bt2" are created using a script in the Bowtie2 package (bowtie2-build) using a sequence file Tin can provide (it can also be recovered from the bt2 files with bowtie2-inspect if I remember well).
  As Nicola said, those files in db_v20 are created with bowtie2-build using a sequence file and you can recover the sequence file by:
bowtie2-inspect metaphlan2/db_v20/mpa_v20_m200 > metaphlan2/markers.fasta
If you want to rebuild them, the command is:
bowtie2-build metaphlan2/markers.fasta metaphlan2/db_v21/mpa_v21_m200

- For the mpa_v20_m200.pkl Tin can also provide the uncompressed python object (or he can provide a couple of lines of code to uncompress it?)
   It is python dictionary and can be read as:
import cPickle as pickle
import bz2

db = pickle.load(bz2.BZ2File('db_v20/mpa_v20_m200.pkl', 'r'))

You can have more information about them at:
https://bitbucket.org/biobakery/metaphlan2#markdown-header-customizing-the-database

In addition, some files were changed the names:
   - metaphlan2_strainer.py -> strainphlan.py
   - strainer_src -> strainphlan_src
   - strainer_tutorial -> strainphlan_tutorial

Some source files were updated as well.
Please let me know if you need other information.

Thanks,
Tin

On Wed, Aug 3, 2016 at 3:38 PM Andreas Tille <andreas@an3as.eu> wrote:
Hi Nicola,

thanks for your answer.

On Tue, Aug 02, 2016 at 04:32:31PM +0000, Nicola Segata wrote:
> Hi Andreas,
>  sorry for the delay in replying. I did get your last two emails but it
> seems the fist one (On Mon, Jul 25, 2016 at 09:45:57PM) never arrived.

Hmmm, sad that there seems to be some mail loss.

> Tin can also provide more info about the binary data in db_v20. The files
> ending with "bt2" are created using a script in the Bowtie2 package
> (bowtie2-build) using a sequence file Tin can provide (it can also be
> recovered from the bt2 files with bowtie2-inspect if I remember well).
>
> For the mpa_v20_m200.pkl Tin can also provide the uncompressed python
> object (or he can provide a couple of lines of code to uncompress it?)

Anything that qualifies as source would be really welcome.  If the
generation of the binary from this source does not make a big effort (in
terms of "takes way longer than 1 hour on a decent build machine")
generating the binaries would be really prefered.

> For the LEfSe package I just added the license in the bitbucket repository.
> For the description, I think you can use the following page:
> https://bitbucket.org/biobakery/biobakery/wiki/lefse
> Does it sound like an appropriate description for the package?

I found this after I've sent my mails - thanks for confirming that this
is the correct description.  I've just uploaded the package to the
Debian new queue.

> Let me know if you have other questions or if I missed answering to other
> emails.

If Tin will answer the binary data issue above I have no further
questions and do not remember any unanswered e-mails.

> thanks so much for your work!

You are welcome

      Andreas.

--
http://fam-tille.de

Reply to: