Hi list, And thanks Mo and others for your work on the current Debian ML policy draft. I have a question regarding the current definition of a "Free Model" in that draft. It reads [1]: > Free Model is a model satisfying ALL the following conditions: > (1) FOSS-Licensed & DFSG-compliant; > (2) trained from explicitly FOSS-licensed & DFSG-compliant datasets (e.g. for supervised or unsupervised learning) or simulators (e.g. for reinforcement learning), and the dataset is publicly available to anonymous users; > (3) corresponding training program is present and complete; What is the intent of the policy's condition 3 in case of a bitrotted training program? Story: I recently made use of a pretrained model published along with a paper describing a certain deep network. I would classify this model as a definitely Free under conditions 1 and 2. However, the 3rd point was severely lacking: The authors of the original paper did publish a complete training program… 5+ years ago. Using frameworks (or versions of frameworks) that are only partially available, or not easily runnable, today. In my particular case, I was able to fix things up and get their original code working in a reasonable way with some work, but the experience showed me that more severe cases of bitrot like this are likely to appear as more models are published in the typical academic way of dumping some code together with the paper and never touching it again. I have no objections to this kind of software development for academic purposes at all, but I do wonder what our position is if the pretrained model remains useful 5+ years later, but the training software *was* "present and complete" but no longer is in any reasonable way. (Or, I guess: is firing up a 5+ year old VM to run the stale code in a reasonable definition of "present and complete"?) Any thoughts? [1] https://salsa.debian.org/deeplearning-team/ml-policy/-/blob/1c467714774ca7c6c47120da91fcb1fd14a160e1/ML-Policy.rst Best, Gard
Attachment:
signature.asc
Description: PGP signature