Re: Bits from /me: A humble draft policy on "deep learning v.s. freedom"
Hi Andy,
Thanks for you comments.
On 2019-05-23 09:28, Andy Simpkins wrote:
> Your wording "The model /should/be reproducible with a fixed random seed." feels
> correct but wonder if guidance notes along the following lines should be added?
>
> *unless* we can reproduce the same results, from the same training data,
> you cannot classify as group 1, "Free Model", because verification that
> training has been carried out on the dataset explicitly licensed under a
> free software license can not be achieved. This should be treated as a
> severe bug and the entire suite should be classified as group 2,
> "ToxicCandy Model", until such time that verification is possible.
Ummm... This is actually a bit cruel to upstream ... And I think there
is still some misunderstanding. I've updated the document and made the
following points clear:
- "Numerically Reproducible" is the default reproduciblity definition
in the context
https://salsa.debian.org/lumin/deeplearning-policy#neural-network-reproducibility
- A Free Model should be Numerically Reproducible,
or at least a locally-trained model can reach similar performance
(e.g. accuracy) compared to the original one.
Similar results are acceptable. The bar "Identical" is not always
reachable.
- The datasets used for training a "ToxicCandy" may be
private/non-free and not everybody can access them. (This case is more
likely a result of problematic upstream licensing, but it sometimes
happens).
One got a free model from internet. That little candy tastes sweet.
One wanted to make this candy at home with the provided recipe, but
surprisingly found out that non-free ingredients are inevitable.
-- ToxicCandy
Is the updated document clearer?
Reply to: