Re: Which columns should we start working on?

To: Nilesh Patra <nilesh@debian.org>
Cc: debian-med@lists.debian.org
Subject: Re: Which columns should we start working on?
From: Steffen Möller <steffen_moeller@gmx.de>
Date: Tue, 23 Mar 2021 22:13:48 +0100
Message-id: <[🔎] 76862b60-6a78-ee4d-9e08-1e3b709bd806@gmx.de>
In-reply-to: <[🔎] YFiCTMQ5hbOi2Pwm@debian>
References: <[🔎] YFiCTMQ5hbOi2Pwm@debian>

Hi Nilesh,

Am 22.03.21 um 12:41 schrieb Nilesh Patra:
>
>> I'm mostly addressing you specifically here for the new "workflow based" packages we should start working on -- as you mentioned at the sprints.
>> Since freeze work should _mostly_ be done by now, we could focus on new packages :-)
Yeah!
>> Would you have any workflow package that you'd like help with?

In short: nextflow - but that is a tricky one, blocking many workflows,
though.

Slightly longer, I wish to encourage everyone to find their own preferences:

* If (preferably if working at a University) you have a research group
near to you that is working on anything SARS related, ask them what they
are doing, try to understand that, and see what software there is and
start a project with them. Mostly forget about Debian in the mean time /
fix it as you need it.

* If there is anything from the spreadsheet's keywords that interests
you then read up on the biology of a few packages mentioned as "workflow
packages" (which is meant to produce something that this is something
the biologists would like to put into the results section of their
paper) and look at the respective documentation, see if this builds,
follow a tutorial if existing. And then we need to learn, still, how we
can make some noise about this such that biologists find the tutorial
for self-education - and/or find you as someone who can help to get this
running on their data (or help finding someone who then helps).

* There are different kinds of packages that may be important for
Debian, also for Debian's acceptance the bioinformatics world

A) housekeeping packages (I just made this name up as a pun on
housekeeping genes) that are just expected to be available. I am not
unlikely to have marked such in red in the leftmost column of the
spreadsheet. It is the kind of package I go for when I am feeling a bit
down and what a quick success.

MEME (Others) - a classic
bbtools (Others and bulk RNA-seq) - we may already have part of
that in the distribution - I was/am a bit confused, still - is this
redundant with bbmap?

B) the "columns". These are representants of what software biologists
are likely to need to go from raw data to a publication and nobody
missing anything. My priorities here are

virus tab:
1st and foremost: artic fieldbioinformatics - this uses the
nanopore to tell what ebola/sars strain you have - this may be as close
to the pandemics as we can possibly get. Since I work at a University
Hospital I think I am allowed to feel positively about finding someone
to field-test our fieldtest package once this is completed.
There is the original artic implementation and a reimplementation with
the nextflow workflow. Whatever we get to first, I tend to think.
Confusing? That is why we need the bio.tools folks - it is too much for
our tasks list (and for bio.tools, still :-) ).

Single-Cell RNA-seq - all of them, preferably
bulk RNA-seq - BioConductor, pigx-rnaseq - is mostly there
nanopore - it is the sequencing technology that is closest to us -
I actually own half of one, Jun has a complete one :) It is used in the
field to genotype viruses - today - it is too young to have a perfect
pipeline for it, yet, I tend to think. And the device is used so very
heterogeneiously. Things get updated very frequently everywhere and so
this is more like a "let's see what is going to be used"-kind of
situation for me at the moment.

There are some tools that block many columns from being completed. To
mention here in particular are the workflow engines, and here it is
nextflow that seems like being a beast to package. So, yes, Nilesh,
please, nextflow out of the way would be a big help.

A^B) the packages that have a direct application to virology/drug
development and are mostly singular applications - look at what
OpenPandemics' Forli lab and colleagues are giving us
https://forlilab.org/ . My picks are

AutoDock-CrankPep (Docking/Structures) since oligopeptides are a
common tool to fish for antibodies, so you want to have something to
model that.

and sometimes it is "community forming" and "technical curiosity" that
triggers me as for
cmdock (Docking/Structures)
autodock-gpu (Docking/Structures)
which would be seen by all the BOINC-people. But who would not go
through their website and dream a bit.

There are other sheets that are a like

anti-A: Packages that nobody expects, yet. "Synthetic Biology" (the
next thing for a while already) or "Molecular Tumor Boards" (the next
thing for even longer (like 25 years since microarrays came around) that
are now emerging). I think I put this up mostly to have a place to put
them, not really thinking that this is something that needs to go into
the distribution asap.

And there are sheets that are not existing - like I would like to care
about if days had just a few more hours - like for proteomics or mass
spec. We are completely blank on ontologies and how these could be
maintained - more Java, mostly. Feel free to add them.

>> And also two questions remain open:
>> * Do we have a tutorial explaining the spreadsheet, or do you think we could find an alternative to the spreadsheet -- a salsa wiki or so? (Mostly for free software documentation perspective)
To say it with Faulty Towers: I did not start it, but I added "Immune
repertoire". The first "Info" page should be something close to a
tutorial. Please everyone improve on it. I find these columns superior
to our task list and also to what bio.tools/SciCrunch have yet come up
with - only OMICtools had then chance to outshine it in its days. Find
something better - I do not mind, very much in the contrary, really. We
want packages, their dependencies and some biologically-relevant
structuring. A dependency graph with tags may be an alternative.
>> * What do you think of the "package of the week plan" that Andreas proposed?

I admit to have forgotten about it. But how many people do we then want
to work at the same package. Andreas did a good job with the
videoconferences such that we got to know each other. So, I suggest to
keep the Excel sheet as a synchronisation tools - we write the name of
ours somewhere in the package line so we know who is active on it. And
whoever wants extra input then just says so.

scanpy I think would have been on my list from last months and I am very
happy to see this addressed.

My next picks would be for A) MEME B) pyomo A^B) autodock-gpu - can we
have three packages of the week?

> Hi Steffen,
>
> Apologies for ping, but any comment on this?

Thank you for the ping, really.

I need another few days on non-Debian-issues, I am afraid.

Best,

Steffen

Reply to:

Follow-Ups:
- Re: Re: Which columns should we start working on?
  - From: Nilesh Patra <nilesh@debian.org>
- Re: Which columns should we start working on?
  - From: Andreas Tille <andreas@fam-tille.de>

References:
- Re: Which columns should we start working on?
  - From: Nilesh Patra <nilesh@debian.org>

Prev by Date: Re: python3-scanpy 1.6.0 patched, could you take a look?
Next by Date: Re: python3-scanpy 1.6.0 patched, could you take a look?
Previous by thread: Re: Which columns should we start working on?
Next by thread: Re: Re: Which columns should we start working on?
Index(es):
- Date
- Thread