Re: [RFC] Counter-Proposal -- Interpretation of DFSG on Artificial Intelligence (AI) Models
I am away from my key (travelling...), though as soon as I can, I'll second this option. I agree it is much clearer than Mo's proposal.
Thomas Goirand (zigo)
On Apr 24, 2025 05:44, Thorsten Glaser <tg@debian.org> wrote:
>
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA384
>
> Cover letter
> ============
>
> (Please do keep me in Cc, I’m not subscribed to the list.)
>
> Hi! I had not realised it’s going to GR with this, so I’ve drafted
> a counter proposal, based on the thread on debian-private around
> <93d2028888fce48ab4b4609d59f7a72c9edc916e.camel@debian.org> and
> earlier thoughts I’ve collected regarding this topic, such as on
> https://evolvis.org/~tg/cc.htm and the interpretation guidelines
> on https://mbsd.evolvis.org/MirOS-Licence.htm (this is a mirror on
> a more capable VM).
>
> I’m not sure how quickly I’ll need seconds, but I would also welcome
> input on this proposal (including from the l10n-en team as I’m not a
> native English speaker).
>
> I’m PGP-signing this with my DD key, as, for the avoidance of doubt,
> should time be short indeed I’m submitting this as a choice. If time
> isn’t short, I’m tentatively submitting it, with working in feedback
> and updating it first as an option.
>
>
> Counter-Proposal -- Interpretation of DFSG on (AI) Models
> =========================================================
>
> Please see the original proposal for background on this.
>
> The counter-proposal is as follows:
>
> The Debian project requires the same level of freedom for AI models
> than it does for other works entering the archive.
>
> Notably:
>
> 1. A model must be trained only from legally obtained and used works,
> honour all licences of the works used in training, and be licenced
> under a suitable licence itself that allows distribution, or it is
> not even acceptable for non-free. This includes an understanding
> that “generative AI” output are derivative works of their inputs
> (including training data and the prompt), insofar as these pass
> threshold of originality, that is, generative AI acts similar to
> a lossy compression followed by decompression, or to a compiler.
>
> Any work resulting from generative use of a model can at most be
> as free as the model itself; e.g. programming with a model from
> contrib/non-free assisting prevents the result from entering main.
>
> The "/usr/share/doc/PACKAGE/copyright" file must include copyright
> notices from all training inputs as required by Policy for “any
> files which are compiled into the object code shipped in the binary
> package”, except for inputs already separately packaged (such as
> the training software, libraries, or inputs already available from
> packages such as word lists also used for spellchecking).
>
> Regarding availability of sources used for training, the normal
> rules of the non-free archive apply.
>
> 2 Models are not suitable for the non-free-firmware archive.
>
> 3. For a model to enter the contrib archive, it may at runtime require
> components from outside of Debian main, but the model itself must
> still comply with the DFSG, i.e. follow below requirements for
> models entering main. If a model requires a component outside of
> main at build or training time, it is only admissible to non-free.
>
> 4. For a model to enter the main archive, all works used in training
> must additionally be available, auditable, and under DFSG-compliant
> licencing. All software used to do the training must be available
> in Debian main.
>
> If the training happens during package build, the sources must be
> present in Debian packages or in the model’s source packages; if
> not, they must still be available in the same way.
>
> This is the same rule as is used for other precompiled works in
> Debian packages that are not regenerated during build: they must
> be able to be regenerated using only Debian tools, waiving the
> requirement to actually do the regenerating during package build
> is a nod to realistic build time and resource usage.
>
> 5. For a model to enter the main archive, the model training itself
> must *either* happen during package build (which, for models of
> a certain size, may need special infrastructure; the handling of
> this is outside of the scope of this resolution), *or* the model
> resulting from training must build in a sufficiently reproducible
> way that a separate rebuilding effort from the same source will
> result in the same trained model. (This includes using reproducible
> seeds for PRNGs used, etc.)
>
> For realistic achievability of this goal, the reproducibility
> requirement is relaxed to not require bitwise equality, as long
> as the resulting model is effectively identical. (As a comparison,
> for C programs this would be equivalent to allowing different
> linking order of the object files in the binary or embedded
> timestamps to differ, or a different encoding of the same opcodes
> (like 31 C0 vs. 33 C0 for i386 “xor eax,eax”), but no functional
> changes as determined by experts in the field.)
>
> 6. For handling of any large packages resulting in this, the normal
> processes are followed (such as discussing in advance with the
> relevant teams, ensuring mirrors are not over-burdened, etc).
>
> The Debian project asks that training sources are not obtained
> unethically, and that the ecological impact of training and using
> AI models be considered.
>
> [End of proposal.]
>
> -----BEGIN PGP SIGNATURE-----
>
> iQIcBAEBCQAGBQJoCV23AAoJEHa1NLLpkAfgfQcP/jDN+p+rY0fPhQUZ/HpJadkJ
> BawiUYp+TMjsXowrXXy9Mp7FyrlWrj+zROfA1tup2+TkdlQSY8A62aWYS62y5z9y
> x5TxqwS3+xH6UmtchmX7alxy7u9vUrcsdUM9NKt1DZQANyqq8+pVTpMKauNNsXr+
> L8zq/37ludyjCf+c9pnJ066CUaLBBMQGWmfPO8c1mjYWNnACXgYuUH1cw8Sgzr5u
> vQrdURGfebrmTCQBbmCO5FOzQ3Q/uLjl5CocC8HWF0TBh7vcVtnYCkrvalECJpO5
> PlCMUZ0MApuEJ1UTUcj+5lDxdH02dcMdFd7v+OB7+E5Jr+MHDR0wWoVaScm9MYno
> Eip0sxbzVRqozeAH5bKKSaIQN+4KL/pVB2bYxwR4N5/W/9cxDsJmF/uoB1lZNtL8
> DOvLar3RmHNVbaXin/E3afhw5L3O7JeppTSCby9Unyow8hmRjfjhz//ApEbOrWfv
> CNH7sdM2mkEe0SXoxLyX7wfmZuWQ2SUZ4nwbj3vmHvM6jrVragCJxibQyVEIzuSQ
> 1FB0MsFa1TrYN4tnR7/q9AiskcHKiTwcdJh0LFCiLZ2F2d2sd4ne60qQTCpmjzzG
> WkhgeTOeLPCDgkHmC+oUEzGpQruKI/surQ9NSGWbFDyEPTGf9rVzMNlVRp0jJSob
> 2PclqIcmvlO8Krw+9klA
> =U1FJ
> -----END PGP SIGNATURE-----
>
Reply to: