[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

VirusSeeker - sorting the Downloads section



Greetings,

So, following the discussion on the chat on Jitsi[0], I'm trying
to summarize what I'm seeing about VirusSeeker.  There have been
good share of guesswork, since I've no medical background, and
especially before I hit the documentation.  So take it with a
grain (or more like a bag) of salt.  ;)

[0] https://salsa.debian.org/med-team/community/2020-covid19-hackathon/-/blob/master/jitsi/20190409_1700_jitsi.log

(TL;DR: jump to "In conclusion"...)

The download[1] section gathers four packages, which currently
are:
  * VirusSeeker_Virome_pipeline_v0.063_database20160824.tgz
  * VirusSeeker_Discovery_pipeline_v0.03_DB20160824.tgz
  * VirusSeeker_pipeline_and_documentation.zip
  * I1164_12629_Harvard_SIV_196_06_2_24_12_rawData.tgz

[1] https://wupathlabs.wustl.edu/virusseeker/download/

The *Virome* package seems to be the core of VirusSeeker for
virome composition analysis.  I was not sure of the purpose of
the *Discovery* archive, but after reaching the documentation,
it seemed to be the same thing but tuned for doing the task of
virus discovery.  There is a lot of duplicate perl code between
the two archives.  But in both cases, I suppose dependencies[2]
described on the website would apply.

[2] https://wupathlabs.wustl.edu/virusseeker/installation/install-prerequisite-software/

The *pipeline and documentation* might serve as a basis for -doc
packages I suppose.  The documentation mostly consists in a set
of HTML pages.  I have been very worried by the presence of
.docx files in the sample outputs at first, but they actually
seem to just be .txt file exports to MS Word 2007+, said .txt
being present next to the .docx.  There is a mention to a
VirusHunter software in index of the documentation, but the link
is pointing to an HTTP 404 error code, so I guess it might need
a refresh.  (Interestingly, or not, I found out this
documentation archive also embeded copies matching bit for bit
of the two previous packages.)

The *rawData* archive contains FASTQ, gz compressed, sample
data, which might be of use for testing and sample data I guess,
although they are weighting some 500M.  I'm not sure how this is
a concern for build and testing processes.

On a side note, there is a page[3] referring to all sorts of
different locations for getting NCBI NT/NR/taxonomy and virus
nucleotide and protein databases.  This is also in this chapter
that the configuration of VirusSeeker is described.  For the
moment, configuring consists in hopping into Perl scripts and
setting paths in perl variables accordingly.  I don't know if
those databases would require additional packaging (if even
allowed), or some kind of mecanism to pull it on the system.
But I'm under the impression that they might be necessary for
the software to be useful.  They are located in a whole set of
different locations.

[3] https://wupathlabs.wustl.edu/virusseeker/installation/install-databases/

Finally, to be noted, in addition to the dependencies page,
there is a System Requirements[4] page which explains the need
for a clustered infrastructure, and VirusSeeker Install page[5]
explaining how to set appropriately the content of the various
perl scripts to make them use a distributed batch job
infrastructure; the reference seemingly being Slurm WLM here.

[4] https://wupathlabs.wustl.edu/virusseeker/system-requirements/
[5] https://wupathlabs.wustl.edu/virusseeker/installation/install-virusseeker/

In conclusion, my impression is that:
 1. in light of my earlier mail[6], Virome and Discovery are
    pending dependencies inclusion into Debian, at least
    "prinseq-lite", and an hypothetical "libstatistics-pac-perl"
    (I began to have a look at these two packages FWIW);
 2. I would believe that Pipeline and Documentation might be
    used for producing documentation already;
 3. and also the rawData may be used for some heavyweight sample
    package I guess, if it makes sense;
 4. on a side note I'm under the impression that there is some
    work to get the configurability up to Debian standards;
    setting variables into perl scripts may not be well handled
    by debian/config.

[6] https://lists.debian.org/debian-med/2020/04/msg00121.html

Did I manage to make it sound like a plan ?  :)

Kind Regards,
-- 
Étienne Mollier <etienne.mollier@mailoo.org>
Fingerprint:  5ab1 4edf 63bb ccff 8b54  2fa9 59da 56fe fff3 882d
Help find cures against the Covid-19 !  Give CPU cycles:
  * Rosetta@home: https://boinc.bakerlab.org/rosetta/
  * Folding@home: https://foldingathome.org/

Attachment: signature.asc
Description: PGP signature


Reply to: