Re: legal questions regarding machine learning models
Mathieu Blondel writes:
> My first question is : is it possible to distribute the model under a
> free software license without distributing the original data that were
> used to train the model? Likewise, is it possible to package directly
> a model in Debian?
The answer to your first question is easy: Yes. Many free software
licenses do not require the distribution of "source code" for any
generated data that is distributed.
Packaging it for Debian is more complicated, because the DFSG *does*
require the distribution of the source form for any software that is
part of Debian. If the input data used to generate the models is not
large, it can be included in a source package. On the other hand, if
the input data is one of those multi-gigabyte data sets that you
mention, the easiest solution might be to package just the model, and
put it in the non-free archive. Depending on whether a small data set
can be used to generate a default model, having a large-input-data
model in non-free may imply that the executable software belongs in
contrib rather than in main.
(Neither a lawyer nor a DD.)