[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Re: Concerns to software freedom when packaging deep-learning based appications.



Hi Ian,

> Lumin writes ("Concerns to software freedom when packaging deep-learning based appications."):
> >  1. Is GPL-licended pretrained neural network REALLY FREE? Is it really
> >     DFSG-compatible?
> 
> No.  No.
> 
> Things in Debian main shoudl be buildable *from source* using Debian
> main.  In the case of a pretrained neural network, the source code is
> the training data.
> 
> In fact, they are probably not redistributable unless all the training
> data is supplied, since the GPL's definition of "source code" is the
> "preferred form for modification".  For a pretrained neural network
> that is the training data.

It would be intresting if we look at some real examples. 
I emphasized GPL becase I see the project[1] mentioned in the original post.
is released under GPL. However, it didn't clearly clarify the license
of the pretrained network that the upstream distributes.

However, according to [1]'s readme: "Recomputing the AlphaGo Zero weights
will take about 1700 years on commodity hardware". Well, I guess that
means pure free software stack can never reproduce this work.

Apart from that Alpha-Go project, actually I see more pretrained models
released under MIT/BSD/Apache2, or public domain. Here is another example
about ImageNet, which is a typical dataset in the computer vision field.

   https://github.com/BVLC/caffe/blob/master/models/bvlc_alexnet/readme.md
   Framework: Caffe(BSD-2-clause), repro code: (BSD-2-clause),
   pretrained-network: (public domain), dataset: ImageNet (???) [2]
   -> Software stack is free, but not the ImageNet dataset.
   -> Debian cannot distribute related applications in the main section.


> >  2. How should we deal with pretrained neural networks when some of us
> >     want to package similar things? Should it go contrib?
> 
> If upstream claims it is GPL, and doesn't supply training data, I
> think it can't even go to contrib.

OK, then what if the model is released under a more premissive license
such as MIT/BSD, given that the model is trained from non-free academic
dataset by a non-free software stack (replacable by CPU implementation
but with ridiculous time cost)? I think in this case that kind of
network can enter contrib.
 
> If upstream does not claim to be supplying source code, or they supply
> the training data, then I guess it can go to contrib.

The Computer Vision research community is eager to release their code
and model under premissive licenses such as BSD/MIT. Many related
conferences publish the papers without access restriction. The biggest
problem lies in the big data.
 
> Note that the *use* of these kind of algorithms for many purposes is
> deeply troublesome.  Depending on the application, it might cause
> legal complications such as difficulties complying with the European
> GDPR, illegal race/sex/etc. discrimination, and of course it comes
> with big ethical problems.
> 
> :-/

hmm... It appears to be quite incompatible to DFSG, but that's another
complex story :-)

> Ian.
> 

[1] https://github.com/gcp/leela-zero
[2] http://image-net.org/download-faq
    I searched and found no explicit license declaration of ImageNet.
	But according to the agreement text, the dataset may be non-free.


Reply to: