[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: legal questions regarding machine learning models



On Wed, 27 May 2009 11:25:09 +0900 Mathieu Blondel wrote:

> On Wed, May 27, 2009 at 7:36 AM, Francesco Poli wrote:
> 
> > I think that in the case of machine learning models, source form is
> > even more clearly distinct from compiled object.
> > We can consider an artificial neural network, for instance (Mathieu,
> > correct me if it's a wrong example).
> > I am under the impression that basically nobody would change connection
> > weights by hand, in order to modify a neural network.
> 
> Yes the connection weights of an artificial neural network are a good
> example of the parameters I was talking about. In practice, nobody
> would change a connection weight by hand because it's impossible to
> predict the effect of this particular weight on the overall
> performance of the model. Training algorithms are mostly clever ways
> to find a good model without trying the infinity of parameter
> combinations.

Good, this confirms my supposition.

> So in practice yes, a model would be barely useful for
> further work on the model without the original data. In that regard,
> the original data AND the program used to train the model (this
> includes the implementations and the options passed to the algorithm)
> can be seen as the only real source.

The program used to train the model is not necessarily part of the
source, IMHO.

The GNU GPL v3 states (in Section 1):

| However, it [the "Corresponding Source" for a work] does not include
| the work's System Libraries, or general-purpose tools or generally
| available free programs which are used unmodified in performing
| those activities [generate, install, and run the object, and modify
| the work] but which are not part of the work.

> 
> But yet again, I could pretend that I just happened to find the model
> parameters by hand.

Free Software is not about pretending you are a sort of oracle who can
guess magic numbers!
Otherwise, any source availability requirement would be moot: I could
always pretend I wrote the machine code by hand, but that won't be
true, in most cases.

> Afterall, a model is just a big set of numbers.

Machine code is just a long sequence of 0s and 1s...

[...]
> However, this is not good on the long
> term since that makes the model dependent on the person who holds the
> data.

Definitely.

[...]
> Is it forbidden for
> someone to release an image made with Photoshop as free?

You *can* create a DFSG-free image with Adobe Photoshop.

If the source form may be read and modified with DFSG-free tools (e.g.:
The Gimp), then everything is OK and the image may be included in
Debian main.

If, on the other hand, the source form of the image may *only* be
manipulated with Photoshop and other non-free tools, then I think that
the image may still be DFSG-free, but belongs in the Debian contrib
archive, at best.

At least, this is how I understand it.

> 
> Regarding Debian packaging, I think it's a wise decision to rebuild
> the model whenever the data and the training program are free, the
> data is not too large and the computation not too long. Should
> objective criterion of what is too large and what is too large be
> decided or should that be left to the DD? Then a remaining question is
> what to do with models for which we don't have the original data or
> the original training program?

My personal take on the matter is that, in order for a package to be
included in Debian main:

 * the package must comply with the DFSG

 * source must be distributed in the source package

 * tools needed to generate (or to use) the object must be DFSG-free
   and included in Debian main

This is how I interpret Policy 2.2.1:
http://www.debian.org/doc/debian-policy/ch-archive.html#s-main

However, it is my understanding that, in some cases (e.g. long
rebuilding times), it is acceptable to also ship pre-built
(architecture-independent) objects in the source package, *along with*
the corresponding source.  One should however be extremely careful in
doing this, since it makes it harder to check and be sure that Policy
2.2.1 requirements are satisfied.


I hope I clarified my opinions.
As stated before, I should stress again that what I expressed above are
my own opinions.
Usual disclaimers: IANAL, TINLA, IANADD, TINASOTODP.


-- 
 New location for my website! Update your bookmarks!
 http://www.inventati.org/frx
..................................................... Francesco Poli .
 GnuPG key fpr == C979 F34B 27CE 5CD8 DC12  31B5 78F4 279B DD6D FCF4

Attachment: pgpPT4mZOFuiV.pgp
Description: PGP signature


Reply to: