Thanks for this proposal, Aigars. How would you compare it with Sam's proposal? As I can see it the general idea behind both proposals is quite similar, even though the wording is different. The main content different I can see is that you focus on the notion of "data information", whereas Sam's proposal is more general and focus on the practicality of being able to make modifications. Assuming you both agree that the proposals go in the same general direction, is there a possibility of merging them into a single one? TIA, Cheers On Tue, May 06, 2025 at 08:55:45PM +0200, Aigars Mahinovs wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA256 > > ** Proposal Text ** > > Choice 3: Training data for training of AI models is not to be > considered "source code" in the context of DFSG. Instead the real > source code in such a case is "Training Data Information" and the > training data itself is an intermediate build artifact. > > AI models are compatible with DFSG only if they provide complete > "Training Data Information". AI models whose reproduction from > training data or from training data information is prohibitively > expensive or is impractical are compatible with DFSG only if they > provide ways to modify the AI model and create derivative works > directly from the trained model. > > The meaning of "Data Information" is based on definitions and > explanations from https://opensource.org/ai/open-source-ai-definition > > ** Rationale ** > > The problem of collection and distribution of training data sets can > be fully avoided by going another step back - seeing the "training > data information" as the *actual* source code and the "training data" > itself only being an intermediate build artifact. While this might not > guarantee a fully reproducible rebuild of a model (even if that could > be a possibility in some cases by identifying the exact version of the > source data with use of hashes), it does a step better - it makes it > possible (if enough resources are invested) to create a new version of > the model with new and updated data. And it does not put the onus on > Debian to redistribute this intermediate data. > > The definition maintains all guidelines of DFSG intact, but add two > clarifications: > > 1) Training data is not source code. There is a difference in > copyright case law between source code and training data. It is really > clear that a compiled binary is a derived work of the source code. > However, there is no direct copyright relationship between training > data and an AI model or its outputs. That is why it should be > considered separately. The specifics may need to be adjusted based on > future court orders. For example, it might be necessary to include > protections against AI regurgitation. > > 2) An AI model that is both prohibitively expensive to reproduce and > is not easily modifiable does not (de-facto) satisfy the DFSG derived > works requirement. > -----BEGIN PGP SIGNATURE----- > > iQIzBAEBCAAdFiEEFmwrqIlWRDzdY39G+mQ7ph0ievsFAmgaWqoACgkQ+mQ7ph0i > evtd0Q/9FrxQqHQI94GBNQF3uA+BbcghJYq4ZSLxGdrhS6g2IQpZ3Vq+dMJ9WKrr > 5Wbmct2u2vt2Mk36WFnuTQDkEv6Cx9QN/lMUfMhcnBVnt8hL1XjRCbQGCMqiUcRz > /QFAGbhjuxwvLWPDAKs3AEWbv0nPTmacEzMVA7s8629ZnRq9sV9fzcnP0jqBBQq0 > lvaeDJBiKgpmM3b/ENeyKopmuRroCpqpG2OTghAsMSa7JHqfibgqamHmFkeDaOJt > 5HveKmcm9AV2PwVP6UZHpyDciCCPFkZSpor1V+02qhEZBtHKNxGNgAYb/Edxnsxh > 1W7MRQrwi8alPXeFKYLKNbD1ZP7WUDjvEXVJF1ucmir0599us+soPjN9VFNkr58F > 5ugoubQN+rcz989tPbSnUst6wSPkDgRlkjtaF+uPn6LCIFuvCt3GH+OxJlmYG/K+ > 1C9Ea60WMkn38b6Yn9gW7WYq09hnP6kpPeXfmD68Ac0YxWKoj18FPD3WDwTc5/S5 > Fp+LpJ3vd1PpcYfacA0a+l7H0Vc5K4woRjzCU4KTVeYpBZSe4hRuOn3igFx6Z53E > cUwjoZqnCLU7SoiDP9xXSBTF3UBM/iTcrW33gBE3ujKyv+p2z74eUvrn302ZFA9G > JlDoRdmTHqLlNncEA04FdJ6+VBNY6GZKGXK5r0vDMnQ26MMHWdU= > =imT+ > -----END PGP SIGNATURE----- > > > -- > Best regards, > Aigars Mahinovs mailto:aigarius@debian.org > -- Stefano Zacchiroli . zack@upsilon.cc . https://upsilon.cc/zack _. ^ ._ Full professor of Computer Science o o o \/|V|\/ Télécom Paris, Polytechnic Institute of Paris o o o </> <\> Co-founder & CSO Software Heritage o o o o /\|^|/\ Mastodon: https://mastodon.xyz/@zacchiro '" V "'
Attachment:
signature.asc
Description: PGP signature