[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Workflows and their dependencies - Jun's dep spreadsheet - status



Hi Andreas,

On 04.06.20 09:23, Andreas Tille wrote:
On Thu, Jun 04, 2020 at 01:49:33AM +0200, Steffen Möller wrote:
Jun created this table
https://docs.google.com/spreadsheets/d/1tApLhVqxRZ2VOuMH_aPUgFENQJfbLlB_PFH_Ah_q7hM/edit?usp=sharing
that lists a set of workflows and its dependencies.
I admit this kind of list was always what I was seeking for.  Its
providing me as a non-user with a great todo list and over the last
three weeks I had my daily pick from it and have filled new queue with
the missings in Debian.  I'm also trying to constantly update our
according UDD query[1].  Please note that sometimes Debian packages have
different names (for example HTSfilter --> r-bioc-htsfilter is available
in Debian).
I could provide a script that parses the online spreadsheet directly and
crafts the UDD query [1] from it. Instead of the mere "yes" the Debian
column could present a binary package name.
This brings up the question:  The UDD query is re-sorting this list
alphabethically (=different from the original document) and according
to the Debian package names (=even more different from the original
document).  My question is:  Would it be more helpful if the query
would conserve the original names and sorting?
(Remark: I previously asked some questions to the said document here
but never got any answer to my questions.  It would help if you, namely
Jun and Steffen who are actively working with this document, would mind
answering since I'm doing this for **your** comfort.)

Frankly, I only work with the sheet.


I also noticed that some pretty generic software is listed there for
instance gzip_reader and this actual one for no good reason since we
have libgzstream-dev and gzip_reader[2] is just wrapping some example
usage around.  That's neither a sensible package nor anything I would
like to see on our todo list.  If it is really needed in some package
that could be added as a patch or so.
Much agreed.
Some are trickier to
package than others, but if I read this right, then artic, scrnaseq and
smartseq2 just wait for nextflow and pigx-rnaseq waits for tests to work
:o/ Shovill just needs a package for itself.
Yes, we have some stuff in Salsa that needs more work and some is not
even on Salsa since it was never on any packaging request list.  I keep
on working down this (nicely and productively growing) list to get at
least everything into salsa and make our covid-19 task reflecting the
list with valuable information.

I don't know if you have decided to have another Sprint next week already,
but whenever this will be, I think having a series of packages in salsa that
need to be finished (i.e., much of the Debian-specific ground work done)
will be appealing to participants.

But this also means that we need some extra communcation to clarify
who is working on what since too much then looks like "oh someone is
working on this".


Some dependencies that we are missing are also not distributed with
Conda. A weird example is the pip package "capsule" as a dependency of
nextflow. Conda however distributes nextflow, so ... what are they doing?
As I said in the t-coffee example:  The build time test is broken for
years and we were finally fixing it.  (Probably just the test was
broken, not t-coffee itself - but who knows this?)  So I assume conda
is not doing the provided test in the first place and we should keep
on with our effort to do serious testing at build time and in CI.

This was some very good news, indeed. I should have replied with some
praise.

Let me look at that capsule over the weekend.



I started to really like that spreadsheet - have also added bcbio to the
list of pending workflow engines. I don't really know where this
spreadsheet could go. It is useful for us now, but I wonder what kind of
questions it can help answering
For me its a very helpful todo list.  Translated into our UDD query
its a pretty nice tool.  I just wonder how we can make the translation
of the list to the query a bit more reliable - that's why my question
above since may be if there would be matching lines and names it would
be easier to keep both in sync.

yes -> binary package name, or keep the yes if the name is the same.

For salsa I have put URLs behind it, so I suggest to use salsa as an
"epoch" to the binary name, like
salsa:vienna-rna

Thank you!

Steffen




[1] https://salsa.debian.org/blends-team/med/-/blob/master/covid-19_doc/bio_covid-19_dependencies_query
[2] https://github.com/gatoravi/gzip_reader



Reply to: