[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: legal questions regarding machine learning models

Le mercredi 27 mai 2009 à 01:17 +0900, Mathieu Blondel a écrit :
> For efficient storage, the model may be stored in binary format but
> human-readable formats (such as XML) may be used, thus allowing easy
> access to the parameters of the models.
> My first question is : is it possible to distribute the model under a
> free software license without distributing the original data that were
> used to train the model? Likewise, is it possible to package directly
> a model in Debian? Although it's very unlikely, I could pretend that I
> found the parameters of the models by hand. In that case, the
> parameters can be seen as "magical numbers" with no explanation
> whatsoever as to how I found them.

This looks very similar to distributing a picture which is a 2D
rendering of a 3D model without distributing the original model. This is
already accepted in the archive, and the reason is that a 2D picture is
its own source, and can serve as a base for modified versions this way.

The same reasoning applies to the model: as long as it is useful to tune
the parameters by hand to produce derived versions, there’s no reason
not to consider it as the source.

Of course, the decision is up to the FTP masters, but I think this
should be accepted for the sake of consistency with things we already
cannot decently exclude from the archive.

> My second question is: Given the difficulty to prove what data were
> actually used to train a model, how can we prevent non-free software
> to use free data such as those of Voxforge?

A widely-used technique is to cleverly hide some minor bugs in the data.
If a non-free model shows the same bugs, you can prove the data was used
illegally. Of course this only works if you manage to keep the bugs

 .''`.      Josselin Mouette
: :' :
`. `'   “I recommend you to learn English in hope that you in
  `-     future understand things”  -- Jörg Schilling

Attachment: signature.asc
Description: Ceci est une partie de message =?ISO-8859-1?Q?num=E9riquement?= =?ISO-8859-1?Q?_sign=E9e?=

Reply to: