[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: [RFC] Counter-Proposal -- Interpretation of DFSG on Artificial Intelligence (AI) Models



I am away from my key (travelling...), though as soon as I can, I'll second this option. I agree it is much clearer than Mo's proposal.

Thomas Goirand (zigo)

On Apr 24, 2025 05:44, Thorsten Glaser <tg@debian.org> wrote:
>
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA384
>
> Cover letter
> ============
>
> (Please do keep me in Cc, I’m not subscribed to the list.)
>
> Hi! I had not realised it’s going to GR with this, so I’ve drafted
> a counter proposal, based on the thread on debian-private around
> <93d2028888fce48ab4b4609d59f7a72c9edc916e.camel@debian.org> and
> earlier thoughts I’ve collected regarding this topic, such as on
> https://evolvis.org/~tg/cc.htm and the interpretation guidelines
> on https://mbsd.evolvis.org/MirOS-Licence.htm (this is a mirror on
> a more capable VM).
>
> I’m not sure how quickly I’ll need seconds, but I would also welcome
> input on this proposal (including from the l10n-en team as I’m not a
> native English speaker).
>
> I’m PGP-signing this with my DD key, as, for the avoidance of doubt,
> should time be short indeed I’m submitting this as a choice. If time
> isn’t short, I’m tentatively submitting it, with working in feedback
> and updating it first as an option.
>
>
> Counter-Proposal -- Interpretation of DFSG on (AI) Models
> =========================================================
>
> Please see the original proposal for background on this.
>
> The counter-proposal is as follows:
>
> The Debian project requires the same level of freedom for AI models
> than it does for other works entering the archive.
>
> Notably:
>
> 1. A model must be trained only from legally obtained and used works,
>    honour all licences of the works used in training, and be licenced
>    under a suitable licence itself that allows distribution, or it is
>    not even acceptable for non-free. This includes an understanding
>    that “generative AI” output are derivative works of their inputs
>    (including training data and the prompt), insofar as these pass
>    threshold of originality, that is, generative AI acts similar to
>    a lossy compression followed by decompression, or to a compiler.
>
>    Any work resulting from generative use of a model can at most be
>    as free as the model itself; e.g. programming with a model from
>    contrib/non-free assisting prevents the result from entering main.
>
>    The "/usr/share/doc/PACKAGE/copyright" file must include copyright
>    notices from all training inputs as required by Policy for “any
>    files which are compiled into the object code shipped in the binary
>    package”, except for inputs already separately packaged (such as
>    the training software, libraries, or inputs already available from
>    packages such as word lists also used for spellchecking).
>
>    Regarding availability of sources used for training, the normal
>    rules of the non-free archive apply.
>
> 2  Models are not suitable for the non-free-firmware archive.
>
> 3. For a model to enter the contrib archive, it may at runtime require
>    components from outside of Debian main, but the model itself must
>    still comply with the DFSG, i.e. follow below requirements for
>    models entering main. If a model requires a component outside of
>    main at build or training time, it is only admissible to non-free.
>
> 4. For a model to enter the main archive, all works used in training
>    must additionally be available, auditable, and under DFSG-compliant
>    licencing. All software used to do the training must be available
>    in Debian main.
>
>    If the training happens during package build, the sources must be
>    present in Debian packages or in the model’s source packages; if
>    not, they must still be available in the same way.
>
>    This is the same rule as is used for other precompiled works in
>    Debian packages that are not regenerated during build: they must
>    be able to be regenerated using only Debian tools, waiving the
>    requirement to actually do the regenerating during package build
>    is a nod to realistic build time and resource usage.
>
> 5. For a model to enter the main archive, the model training itself
>    must *either* happen during package build (which, for models of
>    a certain size, may need special infrastructure; the handling of
>    this is outside of the scope of this resolution), *or* the model
>    resulting from training must build in a sufficiently reproducible
>    way that a separate rebuilding effort from the same source will
>    result in the same trained model. (This includes using reproducible
>    seeds for PRNGs used, etc.)
>
>    For realistic achievability of this goal, the reproducibility
>    requirement is relaxed to not require bitwise equality, as long
>    as the resulting model is effectively identical. (As a comparison,
>    for C programs this would be equivalent to allowing different
>    linking order of the object files in the binary or embedded
>    timestamps to differ, or a different encoding of the same opcodes
>    (like 31 C0 vs. 33 C0 for i386 “xor eax,eax”), but no functional
>    changes as determined by experts in the field.)
>
> 6. For handling of any large packages resulting in this, the normal
>    processes are followed (such as discussing in advance with the
>    relevant teams, ensuring mirrors are not over-burdened, etc).
>
> The Debian project asks that training sources are not obtained
> unethically, and that the ecological impact of training and using
> AI models be considered.
>
> [End of proposal.]
>
> -----BEGIN PGP SIGNATURE-----
>
> iQIcBAEBCQAGBQJoCV23AAoJEHa1NLLpkAfgfQcP/jDN+p+rY0fPhQUZ/HpJadkJ
> BawiUYp+TMjsXowrXXy9Mp7FyrlWrj+zROfA1tup2+TkdlQSY8A62aWYS62y5z9y
> x5TxqwS3+xH6UmtchmX7alxy7u9vUrcsdUM9NKt1DZQANyqq8+pVTpMKauNNsXr+
> L8zq/37ludyjCf+c9pnJ066CUaLBBMQGWmfPO8c1mjYWNnACXgYuUH1cw8Sgzr5u
> vQrdURGfebrmTCQBbmCO5FOzQ3Q/uLjl5CocC8HWF0TBh7vcVtnYCkrvalECJpO5
> PlCMUZ0MApuEJ1UTUcj+5lDxdH02dcMdFd7v+OB7+E5Jr+MHDR0wWoVaScm9MYno
> Eip0sxbzVRqozeAH5bKKSaIQN+4KL/pVB2bYxwR4N5/W/9cxDsJmF/uoB1lZNtL8
> DOvLar3RmHNVbaXin/E3afhw5L3O7JeppTSCby9Unyow8hmRjfjhz//ApEbOrWfv
> CNH7sdM2mkEe0SXoxLyX7wfmZuWQ2SUZ4nwbj3vmHvM6jrVragCJxibQyVEIzuSQ
> 1FB0MsFa1TrYN4tnR7/q9AiskcHKiTwcdJh0LFCiLZ2F2d2sd4ne60qQTCpmjzzG
> WkhgeTOeLPCDgkHmC+oUEzGpQruKI/surQ9NSGWbFDyEPTGf9rVzMNlVRp0jJSob
> 2PclqIcmvlO8Krw+9klA
> =U1FJ
> -----END PGP SIGNATURE-----
>


Reply to: