[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: legal questions regarding machine learning models



On Wed, May 27, 2009 at 7:36 AM, Francesco Poli <frx@firenze.linux.it> wrote:

> I think that in the case of machine learning models, source form is
> even more clearly distinct from compiled object.
> We can consider an artificial neural network, for instance (Mathieu,
> correct me if it's a wrong example).
> I am under the impression that basically nobody would change connection
> weights by hand, in order to modify a neural network.

Yes the connection weights of an artificial neural network are a good
example of the parameters I was talking about. In practice, nobody
would change a connection weight by hand because it's impossible to
predict the effect of this particular weight on the overall
performance of the model. Training algorithms are mostly clever ways
to find a good model without trying the infinity of parameter
combinations. So in practice yes, a model would be barely useful for
further work on the model without the original data. In that regard,
the original data AND the program used to train the model (this
includes the implementations and the options passed to the algorithm)
can be seen as the only real source.

But yet again, I could pretend that I just happened to find the model
parameters by hand. Afterall, a model is just a big set of numbers.
Who could tell what data I did use to train my model? Due to the lack
of quality free data, it's quite tempting to use non-free data in
order to create free models. However, this is not good on the long
term since that makes the model dependent on the person who holds the
data.

I mentioned Voxforge in my previous email. Their goal is to use their
free spech data to train models with HTK and use the models with
Julius. You can get the source code of HTK after registration on their
website but the license has severe restrictions so HTK is not free
software. Julius is a free software speech recognition engine that can
use models trained with HTK. Note that HTK is pretty much THE speech
recognition framework in the speech recognition community. If you
consider that the ultimate source of a model is not only the data but
also the software used to train it, then Voxforge models built with
HTK can't be free, even though the data were free. Is it forbidden for
someone to release an image made with Photoshop as free?

Regarding Debian packaging, I think it's a wise decision to rebuild
the model whenever the data and the training program are free, the
data is not too large and the computation not too long. Should
objective criterion of what is too large and what is too large be
decided or should that be left to the DD? Then a remaining question is
what to do with models for which we don't have the original data or
the original training program?

Thank you,
Mathieu


Reply to: