[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Description for lefse tools (Was: Origin of data files in MetaPhLan2)



Hi Tin,

On Wed, Aug 03, 2016 at 02:01:01PM +0000, Duy Tin Truong wrote:
> > - Tin can also provide more info about the binary data in db_v20. The files
> > ending with "bt2" are created using a script in the Bowtie2 package
> > (bowtie2-build) using a sequence file Tin can provide (it can also be
> > recovered from the bt2 files with bowtie2-inspect if I remember well).
> As Nicola said, those files in db_v20 are created with bowtie2-build
> using a sequence file and you can recover the sequence file by:
> 
> bowtie2-inspect metaphlan2/db_v20/mpa_v20_m200 > metaphlan2/markers.fasta
> 
> If you want to rebuild them, the command is:
> 
> bowtie2-build metaphlan2/markers.fasta metaphlan2/db_v21/mpa_v21_m200

I can confirm that I can reproduce the files byte identical from
markers.fasta.  Is there any reason to ship the binary form instead of
the fasta text file?  Moreover, what is the source of the markers.fasta?
Is there any related publication or so?

> > For the mpa_v20_m200.pkl Tin can also provide the uncompressed python
> > object (or he can provide a couple of lines of code to uncompress it?)
> It is python dictionary and can be read as:
> 
> import cPickle as pickleimport bz2
> db = pickle.load(bz2.BZ2File('db_v20/mpa_v20_m200.pkl', 'r'))
>
> You can have more information about them at:
> https://bitbucket.org/biobakery/metaphlan2#markdown-header-customizing-the-database

OK, that page clarifies the method.  Just a personal remark from the
point of view of an outsider of bioinformatics:  I'd regard the creation
process of the mpa_v20_m200.pkl file a bit cumbersome.  I'd personally
prefer droping some text record somewhere and call a script processing
this record rather than writing an own script.
 
> In addition, some files were changed the names:
>    - metaphlan2_strainer.py -> strainphlan.py
>    - strainer_src -> strainphlan_src
>    - strainer_tutorial -> strainphlan_tutorial
> 
> Some source files were updated as well.
> Please let me know if you need other information.

Just drop me a not once you might release a new version containing these
changes.  I think I'll try to release the current version as is since at
least the origin of the files is clarified now.  I'm not yet sure whether
the size of the data is acceptable or might spoil some limit.  Regarding
this I'm wondering whether I create a source tarball including rather
markers.fasta and create the bt2 files in the build process.

Kind regards

       Andreas. 

-- 
http://fam-tille.de


Reply to: