Re: Re: Concerns to software freedom when packaging deep-learning based appications.
Hi Ian,
> Lumin writes ("Concerns to software freedom when packaging deep-learning based appications."):
> > 1. Is GPL-licended pretrained neural network REALLY FREE? Is it really
> > DFSG-compatible?
>
> No. No.
>
> Things in Debian main shoudl be buildable *from source* using Debian
> main. In the case of a pretrained neural network, the source code is
> the training data.
>
> In fact, they are probably not redistributable unless all the training
> data is supplied, since the GPL's definition of "source code" is the
> "preferred form for modification". For a pretrained neural network
> that is the training data.
It would be intresting if we look at some real examples.
I emphasized GPL becase I see the project[1] mentioned in the original post.
is released under GPL. However, it didn't clearly clarify the license
of the pretrained network that the upstream distributes.
However, according to [1]'s readme: "Recomputing the AlphaGo Zero weights
will take about 1700 years on commodity hardware". Well, I guess that
means pure free software stack can never reproduce this work.
Apart from that Alpha-Go project, actually I see more pretrained models
released under MIT/BSD/Apache2, or public domain. Here is another example
about ImageNet, which is a typical dataset in the computer vision field.
https://github.com/BVLC/caffe/blob/master/models/bvlc_alexnet/readme.md
Framework: Caffe(BSD-2-clause), repro code: (BSD-2-clause),
pretrained-network: (public domain), dataset: ImageNet (???) [2]
-> Software stack is free, but not the ImageNet dataset.
-> Debian cannot distribute related applications in the main section.
> > 2. How should we deal with pretrained neural networks when some of us
> > want to package similar things? Should it go contrib?
>
> If upstream claims it is GPL, and doesn't supply training data, I
> think it can't even go to contrib.
OK, then what if the model is released under a more premissive license
such as MIT/BSD, given that the model is trained from non-free academic
dataset by a non-free software stack (replacable by CPU implementation
but with ridiculous time cost)? I think in this case that kind of
network can enter contrib.
> If upstream does not claim to be supplying source code, or they supply
> the training data, then I guess it can go to contrib.
The Computer Vision research community is eager to release their code
and model under premissive licenses such as BSD/MIT. Many related
conferences publish the papers without access restriction. The biggest
problem lies in the big data.
> Note that the *use* of these kind of algorithms for many purposes is
> deeply troublesome. Depending on the application, it might cause
> legal complications such as difficulties complying with the European
> GDPR, illegal race/sex/etc. discrimination, and of course it comes
> with big ethical problems.
>
> :-/
hmm... It appears to be quite incompatible to DFSG, but that's another
complex story :-)
> Ian.
>
[1] https://github.com/gcp/leela-zero
[2] http://image-net.org/download-faq
I searched and found no explicit license declaration of ImageNet.
But according to the agreement text, the dataset may be non-free.
Reply to: