[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Help for asking upstreams about free licenses urgently needed (Was: Help: Seeking source code of guppy base caller)



Dear,

On Tue, 5 May 2020 at 06:53, Andreas Tille <tille@debian.org> wrote:

> >  - Guppy is a moving target, and whichever version we would distribute
> >    in Stable is unlikely to satisfy the users a year later.
> >
> >  - Upgrades are not drop-in replacements for each other and a laboratory
> >    typycally needs to install several versions side-to-side.
>
> I wonder how users of that software are dealing with this.

Personally, I am using on the top of Debian the package manager GNU
Guix with custom channels for installing these non-free software.  It
helps because it is easier to travel through the history tree of the
packages and because ``profiles`` allow to install several versions
side-to-side.

The presentation "seeing Debian through a Functional lens" by Joey
Hess at DebConf14 helped me to catch the point about ``functional
package manager``.

BTW, thank you for all the hard packaging work you are doing.  I am
still using Debian (med) packages for the ones I care less; my motto
is: if it is not planned to be in Debian, then it is not really
useful. ;-)


> >  - The conversion from raw to FASTQ is done by neural network algorithms
> >    for which we do not have access to the training data, and therefore
> >    the freedom to modify Guppy would be limited to the sugar around the
> >    core algorithms.
>
> That's a strong point actually.  However, we will face more and more
> problems of this nature.  Mo's attempt to write a deep learning policy
> might help here a bit.

Note that considering the Guppy case -- because it is non-free and the
structure of the neuronal network is thus not know -- there is no
point at all. :-)

However, I think the "problem" of Deep Learning is not new.  Probably
not the right place to discuss that.

1. Trying to state if the weights are part or not of an free licensed
application does appear to me relevant.  It is part of the application
as any icon image can be part of some application.  Because the
application is free, the structure of the network is known and so any
other weights can be provided (yes they will be probably irrelevant).
The only question could be, IMHO, in which format the weights are
stored

2. The weights are simply data resulting of one (big) processing.
This process can be well-describe or it cannot be.  The tools used can
be free or cannot be.  It does not matter; the only point is the
license of such data.  For example, an aligner needs a genome for
reference.  No one argues that all the data used -- notebook,
discussion for the consensus, etc. -- to build this reference has to
be released under free licenses.  It is the same for annotations.
Another example is all the default values, e.g., the ones in
scikit-learn; they are based on training data set and it is not
necessarily available.  It happens more than often that software use
the data resulting of a process of other (training) data.  And the
only concern about user freedom is the license of the resulting data.

3. The access of the training data set is not about freedom but about
(reproducible) science.  Is the weights considered "scientific" if
they are not available?

>From my point of view the Mo Zhou's policy melds free software and
(real) Science, or say reproducibility.  There are bridges between
both and part of the same big picture.


> > In that sense, I think that if we want to distribute a basecaller in
> > Debian, we should better pick an alternative that is already free.  Some
> > of them are reported to perform as well as Guppy.  But which one to
> > pick, and how about long-term mainteance ?
>
> Once I've started packaging deepbinner[1] which is stalled as long as we
> do not have python3-tensorflow.  But may be that's at the horizon since
> bazel packaging sounded quite promising.

That's sound awesome!


> > Altogether, I think that we will best serve our users by making sure
> > that Free basecallers are easy to install on Debian, providing the
> > standard tools for downstream analysis (we are quite good at this), and
> > adding value by supporting bioinformatics workflow systems.
>
> That's exactly my opinion here.

Really cool!  That's why Debian rocks!


Thank you for all the work that helps a lot to get thing done more easily.


Best regards,
simon


Reply to: