=============================================================================== Brief Background, Definition, Scope, and Purpose of the Proposal =============================================================================== AI software grows more and more popular, becoming a notable part of the software ecosystem. This trend reveals some new questions and challenges, especially in the interpretation of the Debian Free Software Guidelines (DFSG) on pre-trained AI models, urging the Debian Project to revisit its interpretation of the Debian Free Software Guidelines (DFSG) in the context of AI software and models. A pre-trained "AI model" is usually stored on disk in binary formats designed for numerical arrays, as a "model checkpoint" or "state dictionary", which is essentially a collection of matrices and vectors, holding the learned information from the training data or simulator. When the user make use of such file, it is usually loaded by an inference program, which performs numerical computations to produce outputs based on the learned information in the model. Please refer to the appendix for more background information about AI. This proposal focuses on one interpretation of the DFSG on a particular type of pre-trained AI models, that (1) is released under DFSG-compliant free software licenses like MIT/Expat, Apache-2.0, etc, and satisfies any of (2) and (3) below -- (2) is trained on data or simulator that is private, proprietary, or inaccessible to the public; (3) does not provide the original training program. To avoid creating new terminologies, we will refer to this type of file as "AI models released under open source license without original training data or program" without any abbreviation. Such models are referred to as "Open Weights" in some circumstances (See: https://opensource.org/ai/open-weights). The purpose of this proposal is to reach a community consensus on how we should treat and handle the described type of AI models, which is an inevitable issue in the future. If necessary, I can work with the Debian Policy Team to incorporate the GR result into appropriate sections of the Debian Policy (e.g., in Section 10 "Files"). | Note: While nowadays people use "AI" to refer to LLMs, it is a very broad term | that covers much more than language models. AI models apart from language | models must be considered as well, such as computer vision models, audio | recognition models, etc. | Note: If condition (1) is not satisfied, it is usually seen "non-free" in the | context of Debian community and no voting is needed. In addition, if | everything (including but not limited to the model itself, training data | training program, and inference program) is released under DFSG-compliant | licenses, that again needs no voting. | Note: Traditional software parts, like a Python script or a C++ program, are | out of the scope of this proposal since that is a well-defined case. For | example, a deep learning framework or inference software written in | Python or C++, i.e., the program that runs the AI models, is out of the | scope of this proposal. =============================================================================== Proposal A: "AI models released under open source license without original training data or program" are not seen as DFSG-compliant. =============================================================================== The "AI models released under open source license without original training data or program", a particular type of files as explained above, are not seen as DFSG-compliant. Hence, they can not be included in the "main" section of the Debian archive. This proposal does not specify whether the "non-free" section of Debian archive can include those files. ------------------------------------------------------------------------------- Appendix ------------------------------------------------------------------------------- Inevitably there may be some terminology and/or backgrounds that is not well-known or well-understood by the general public. Please refer to the appendices for more information. If you cannot find relevant information to answer your question, please consult a human professional -- or an LLM. See appendix A for detailed rationale of this proposal. See appendix B for background and comments about current AI software. See appendix C for some related previous efforts and discussions. See appendix D for comments on potential implications of this proposal. [Appendix A] https://salsa.debian.org/lumin/gr-ai-dfsg/-/blob/main/AppendixA.txt [Appendix B] https://salsa.debian.org/lumin/gr-ai-dfsg/-/blob/main/AppendixB.txt [Appendix C] https://salsa.debian.org/lumin/gr-ai-dfsg/-/blob/main/AppendixC.txt [Appendix D] https://salsa.debian.org/lumin/gr-ai-dfsg/-/blob/main/AppendixD.txt Disclaimer ---------- We acknowledge that releasing useful AI models under permissive licenses like MIT/Expat and Apache-2.0 is a generous act from the original authors due to huge costs, and it is a great contribution to the software ecosystem and the society. We sincerely respect the respective authors' work. On the other hand, DFSG sets a pretty high standard on software that can be included in the Debian distribution, which means the GR may lead to some results that not everybody agrees with. Nevertheless, we appreciate your understanding of the mission of the Debian project -- to create a free operating system, where the "free" means "software freedom".
Attachment:
signature.asc
Description: This is a digitally signed message part