[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Idea wanted: What is the most key open source projects to fight COVID-19?



Dear Jun,

On Tue, Apr 21, 2020 at 10:47:09PM +0200, Jun Aruga wrote:
> Watching COVID19 Virtual BioHackathon 2020 kick-off [1] and wrap-up
> [2] videos, I was thinking about this question.
> 
> What are the most key open source projects (or packages) that we need
> to care to maintain to fight COVID-19?
> 
> "key" means the packages that we care more about than other packages.
> Sorry for the ambiguous question.
> I am curious and want to be concious about the priorities.
> 
> Well, it's a broader topic. There are several factors such as
> Sequencing, Machine Learning, Graph, Workflow and etc in it.
> I shared the 3 nf-core pipelines nf-core/nanoseq, nf-core/artic,
> nf-core/viralrecon in the email thread: Subject
> https://github.com/nf-core/covid19 - nextflow pipeline . And the
> following are the used software in each pipeline.

Thank you for this analysis.

> In my option, the packages written in compiling language such as C,
> C++ and taking long time to compile are the "key" packages.

I personally would not subscribe the distinction based on the technology
used to develop the software.  I'd rather decide on the pure usage
statistics.  Sometimes the dependency tree is pretty complex and packages
of interpreted languages might get in conflict.  IMHO its better to
priorise on
   a) function
   b) available tests

> Because
> users can still run the script language software without deb package,
> and users can compile software that is easy to compile by themselves.
> And the essential function's software is also the key. In this case,
> that is sequencing aligner.
> 
> So, the key packages are bowite2, minimap2, bwa in the list of the pipelines.
> And simde is used to support the packages on multiple CPU architctures [3].
> 
> So, the most key packages that we care about to fight COVID-19 are in
> order to the priority.
> 
> 1. simde
> 2. bowtie2 (build time is long. It's relatively hard to compile it).
> 3. minimap2
> 4. bwa

I'm lacking the bioinformatics background to decide about this but from
usage numbers these packages seem to be frequently used.
 
> That's my observation.
> So, do you have any ideas or observations about the question? I would
> like to hear.
> 
> Thank you.
> 
> ## Used software in each pipeline.


I'm adding comments to the software packages you mentioned:
 
> https://github.com/nf-core/nanoseq/blob/master/bin/scrape_software_versions.py
> guppy
Missing in Debian.  Is it this project
   https://staff.aist.go.jp/yutaka.ueno/guppy/  ?

> qcat
> pycoQC
Both just uploaded to new (including dependency python3-parasail)

> NanoPlot
I'll add
   https://github.com/wdecoster/NanoPlot
to our todo list

> FastQC
In Debian.

> GraphMap2
I'll add
   https://github.com/lbcb-sci/graphmap2
to our todo list

> minimap2
> Samtools
> BEDTools
> MultiQC
All four in Debian.

> 
> https://github.com/nf-core/artic/blob/dev/bin/scrape_software_versions.py
> FastQC
> NanoPlot
> BWA
> minimap2
> Samtools
> BEDTools
> MultiQC

See above regarding NanoPlot - all others in Debian.

> 
> https://github.com/nf-core/viralrecon/blob/dev/bin/scrape_software_versions.py
> parallel-fastq-dump
I'll add
   https://github.com/rvalieris/parallel-fastq-dump
to our todo list

> FastQC
> fastp
> Bowtie 2
> Samtools
> BEDTools
> Picard
All in Debian.

> iVar
I'll add
   https://github.com/andersen-lab/ivar
to our todo list

> VarScan 2
In Debian non-free.  Its on our software liberation Wiki
   https://wiki.debian.org/DebianMed/SoftwareLiberation
   It would be a *huge* service to the community to
   convince upstream about free license
 
> SnpEff
Ahhhh, that one rings a bell.  Its hard since several not yet packaged
predepends.  I've spent hours on it before but I'll add this to our
todo list
   https://salsa.debian.org/med-team/snpeff

> SnpSift
Same source as SnpEff (see above)

> BCFTools
> Cutadapt
> Kraken2
> SPAdes
> Unicycler
> minia
> Minimap2
> vg
> BLAST
> ABACAS
All in Debian.

> QUAST

Thats a pretty complex assembly of third party software (for instance
including their own copy of bwa, minimap2, bedtools and lots of others).
For instance it was my motivation to package sambamba which on its own
is quite a complex packaging project (beeing RC buggy half of the time
of its existance :-().  It also includes genemark a binary since it is
non-free - see again our software liberation page
   https://wiki.debian.org/DebianMed/SoftwareLiberation
   -> I'd like to repeat that freeing this would be very sensible.
In short packaging quast is pretty tough - but there is at least a
weak (not building yet!) start:
   https://salsa.debian.org/med-team/quast

> R

Well R in itself is cheap - if some specific R packages are used and
we might not have packaged these this should be easily doable.

> MultiQC
In Debian.


In general your list of software is extremely helpful.  Thanks a lot for
it.  I've added it to the covid-19 task[4] (which will be re-rendered
soon).  The said todo list were I've added the projects is in the COVID-19
coordination wiki[5]

As always: Everybody is kindly invited to pick from the todo list.
Please do not underestimate the todo items contacting authors to free
their code.  Every little contribution here is *extremely* helpful and
highly appreciated.

Thanks again Jun for your very helpful contribution

     Andreas.

> [1] https://youtu.be/x-QTP5Z_WIU
> [2] https://youtu.be/g5cQk8jIMwo
> [3] https://wiki.debian.org/SIMDEverywhere
[4] https://blends.debian.org/med/tasks/covid-19
[5] https://salsa.debian.org/med-team/community/2020-covid19-hackathon/-/wikis/COVID-19-Hackathon-packages-needing-work

-- 
http://fam-tille.de


Reply to: