[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Corresponding source for a machine-learning data set (was: Concerns to software freedom when packaging deep-learning based appications.)



Ian Jackson <ijackson@chiark.greenend.org.uk> writes:

> […] In the case of a pretrained neural network, the source code is the
> training data.
>
> In fact, they are probably not redistributable unless all the training
> data is supplied, since the GPL's definition of "source code" is the
> "preferred form for modification". For a pretrained neural network
> that is the training data.

One hopeful sign is that people are addressing the need for the source
form of a machine-learning product. In this case, it's by offering a
version control system customised to machine-learning data.

    <URL:https://blog.dataversioncontrol.com/data-version-control-tutorial-9146715eda46>

Is that an approach we can recommend to upstream developers who publish
machine-learning data sets?

-- 
 \         “A child of five could understand this. Fetch me a child of |
  `\                                              five.” —Groucho Marx |
_o__)                                                                  |
Ben Finney


Reply to: