I support and sponsor this proposal. Best Regards, Boyuan Yang On 2025-04-19 19:56, M. Zhou wrote: > =============================================================================== > Brief Background, Definition, Scope, and Purpose of the Proposal > =============================================================================== > > AI software grows more and more popular, becoming a notable part of the > software ecosystem. This trend reveals some new questions and challenges, > especially in the interpretation of the Debian Free Software Guidelines (DFSG) > on pre-trained AI models, urging the Debian Project to revisit its > interpretation of the Debian Free Software Guidelines (DFSG) in the context of > AI software and models. > > A pre-trained "AI model" is usually stored on disk in binary formats designed > for numerical arrays, as a "model checkpoint" or "state dictionary", which is > essentially a collection of matrices and vectors, holding the learned > information from the training data or simulator. When the user make use of such > file, it is usually loaded by an inference program, which performs numerical > computations to produce outputs based on the learned information in the model. > Please refer to the appendix for more background information about AI. > > This proposal focuses on one interpretation of the DFSG on a particular type of > pre-trained AI models, that (1) is released under DFSG-compliant free software > licenses like MIT/Expat, Apache-2.0, etc, and satisfies any of (2) and (3) > below -- (2) is trained on data or simulator that is private, proprietary, or > inaccessible to the public; (3) does not provide the original training program. > To avoid creating new terminologies, we will refer to this type of file as "AI > models released under open source license without original training data or > program" without any abbreviation. Such models are referred to as "Open > Weights" in some circumstances (See: https://opensource.org/ai/open-weights). > > The purpose of this proposal is to reach a community consensus on how we should > treat and handle the described type of AI models, which is an inevitable issue > in the future. If necessary, I can work with the Debian Policy Team to > incorporate the GR result into appropriate sections of the Debian Policy (e.g., > in Section 10 "Files"). > > | Note: While nowadays people use "AI" to refer to LLMs, it is a very broad term > | that covers much more than language models. AI models apart from language > | models must be considered as well, such as computer vision models, audio > | recognition models, etc. > > | Note: If condition (1) is not satisfied, it is usually seen "non-free" in the > | context of Debian community and no voting is needed. In addition, if > | everything (including but not limited to the model itself, training data > | training program, and inference program) is released under DFSG-compliant > | licenses, that again needs no voting. > > | Note: Traditional software parts, like a Python script or a C++ program, are > | out of the scope of this proposal since that is a well-defined case. For > | example, a deep learning framework or inference software written in > | Python or C++, i.e., the program that runs the AI models, is out of the > | scope of this proposal. > > > =============================================================================== > Proposal A: "AI models released under open source license without original > training data or program" are not seen as DFSG-compliant. > =============================================================================== > > The "AI models released under open source license without original training > data or program", a particular type of files as explained above, are not seen > as DFSG-compliant. Hence, they can not be included in the "main" section of the > Debian archive. This proposal does not specify whether the "non-free" section > of Debian archive can include those files. > > > ------------------------------------------------------------------------------- > Appendix > ------------------------------------------------------------------------------- > > Inevitably there may be some terminology and/or backgrounds that is not > well-known or well-understood by the general public. Please refer to the > appendices for more information. If you cannot find relevant information to > answer your question, please consult a human professional -- or an LLM. > > See appendix A for detailed rationale of this proposal. > See appendix B for background and comments about current AI software. > See appendix C for some related previous efforts and discussions. > See appendix D for comments on potential implications of this proposal. > > [Appendix A] https://salsa.debian.org/lumin/gr-ai-dfsg/-/blob/main/AppendixA.txt > [Appendix B] https://salsa.debian.org/lumin/gr-ai-dfsg/-/blob/main/AppendixB.txt > [Appendix C] https://salsa.debian.org/lumin/gr-ai-dfsg/-/blob/main/AppendixC.txt > [Appendix D] https://salsa.debian.org/lumin/gr-ai-dfsg/-/blob/main/AppendixD.txt > > > Disclaimer > ---------- > > We acknowledge that releasing useful AI models under permissive licenses like > MIT/Expat and Apache-2.0 is a generous act from the original authors due to > huge costs, and it is a great contribution to the software ecosystem and the > society. We sincerely respect the respective authors' work. On the other hand, > DFSG sets a pretty high standard on software that can be included in the Debian > distribution, which means the GR may lead to some results that not everybody > agrees with. Nevertheless, we appreciate your understanding of the mission of > the Debian project -- to create a free operating system, where the "free" means > "software freedom".
Attachment:
signature.asc
Description: This is a digitally signed message part