Clarification regarding "complete training program" in ML policy draft

To: debian-ai@lists.debian.org
Subject: Clarification regarding "complete training program" in ML policy draft
From: Gard Spreemann <gspr@nonempty.org>
Date: Tue, 17 Aug 2021 12:52:48 +0200
Message-id: <[🔎] 87y290xrzz.fsf@nonempty.org>

Hi list,

And thanks Mo and others for your work on the current Debian ML policy
draft. I have a question regarding the current definition of a "Free
Model" in that draft. It reads [1]:

> Free Model is a model satisfying ALL the following conditions:
> (1) FOSS-Licensed & DFSG-compliant;
> (2) trained from explicitly FOSS-licensed & DFSG-compliant datasets (e.g. for supervised or unsupervised learning) or simulators (e.g. for reinforcement learning), and the dataset is publicly available to anonymous users;
> (3) corresponding training program is present and complete;

What is the intent of the policy's condition 3 in case of a bitrotted
training program?

Story: I recently made use of a pretrained model published along with a
paper describing a certain deep network. I would classify this model as
a definitely Free under conditions 1 and 2. However, the 3rd point was
severely lacking: The authors of the original paper did publish a
complete training program… 5+ years ago. Using frameworks (or versions
of frameworks) that are only partially available, or not easily
runnable, today.

In my particular case, I was able to fix things up and get their
original code working in a reasonable way with some work, but the
experience showed me that more severe cases of bitrot like this are
likely to appear as more models are published in the typical academic
way of dumping some code together with the paper and never touching it
again. I have no objections to this kind of software development for
academic purposes at all, but I do wonder what our position is if the
pretrained model remains useful 5+ years later, but the training
software *was* "present and complete" but no longer is in any reasonable
way. (Or, I guess: is firing up a 5+ year old VM to run the stale code
in a reasonable definition of "present and complete"?)

Any thoughts?


[1] https://salsa.debian.org/deeplearning-team/ml-policy/-/blob/1c467714774ca7c6c47120da91fcb1fd14a160e1/ML-Policy.rst


 Best,
 Gard

Attachment: signature.asc
Description: PGP signature

Reply to:

Prev by Date: Processed: raise severity of GCC 11 ftbfs issues to important
Next by Date: TensorFlow Lite Debian Packaging
Previous by thread: Processed: raise severity of GCC 11 ftbfs issues to important
Next by thread: TensorFlow Lite Debian Packaging
Index(es):
- Date
- Thread