[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Proposal -- Interpretation of DFSG on Artificial Intelligence (AI) Models



I second this proposal, cited below in full.

Regards

Carsten

"M. Zhou" <lumin@debian.org> writes:

> ===============================================================================
> Brief Background, Definition, Scope, and Purpose of the Proposal
> ===============================================================================
>
> AI software grows more and more popular, becoming a notable part of the
> software ecosystem. This trend reveals some new questions and challenges,
> especially in the interpretation of the Debian Free Software Guidelines (DFSG)
> on pre-trained AI models, urging the Debian Project to revisit its
> interpretation of the Debian Free Software Guidelines (DFSG) in the context of
> AI software and models.
>
> A pre-trained "AI model" is usually stored on disk in binary formats designed
> for numerical arrays, as a "model checkpoint" or "state dictionary", which is
> essentially a collection of matrices and vectors, holding the learned
> information from the training data or simulator. When the user make use of such
> file, it is usually loaded by an inference program, which performs numerical
> computations to produce outputs based on the learned information in the model.
> Please refer to the appendix for more background information about AI.
>
> This proposal focuses on one interpretation of the DFSG on a particular type of
> pre-trained AI models, that (1) is released under DFSG-compliant free software
> licenses like MIT/Expat, Apache-2.0, etc, and satisfies any of (2) and (3)
> below -- (2) is trained on data or simulator that is private, proprietary, or
> inaccessible to the public; (3) does not provide the original training program.
> To avoid creating new terminologies, we will refer to this type of file as "AI
> models released under open source license without original training data or
> program" without any abbreviation. Such models are referred to as "Open
> Weights" in some circumstances (See: https://opensource.org/ai/open-weights).
>
> The purpose of this proposal is to reach a community consensus on how we should
> treat and handle the described type of AI models, which is an inevitable issue
> in the future. If necessary, I can work with the Debian Policy Team to
> incorporate the GR result into appropriate sections of the Debian Policy (e.g.,
> in Section 10 "Files").
>
> | Note: While nowadays people use "AI" to refer to LLMs, it is a very broad term
> |   that covers much more than language models. AI models apart from language
> |   models must be considered as well, such as computer vision models, audio
> |   recognition models, etc.
>
> | Note: If condition (1) is not satisfied, it is usually seen "non-free" in the
> |   context of Debian community and no voting is needed. In addition, if
> |   everything (including but not limited to the model itself, training data
> |   training program, and inference program) is released under DFSG-compliant
> |   licenses, that again needs no voting.
>
> | Note: Traditional software parts, like a Python script or a C++ program, are
> |   out of the scope of this proposal since that is a well-defined case. For
> |   example, a deep learning framework or inference software written in
> |   Python or C++, i.e., the program that runs the AI models, is out of the
> |   scope of this proposal.
>
>
> ===============================================================================
> Proposal A: "AI models released under open source license without original
>             training data or program" are not seen as DFSG-compliant.
> ===============================================================================
>
> The "AI models released under open source license without original training
> data or program", a particular type of files as explained above, are not seen
> as DFSG-compliant. Hence, they can not be included in the "main" section of the
> Debian archive. This proposal does not specify whether the "non-free" section
> of Debian archive can include those files.
>
>
> -------------------------------------------------------------------------------
> Appendix
> -------------------------------------------------------------------------------
>
> Inevitably there may be some terminology and/or backgrounds that is not
> well-known or well-understood by the general public. Please refer to the
> appendices for more information. If you cannot find relevant information to
> answer your question, please consult a human professional -- or an LLM.
>
> See appendix A for detailed rationale of this proposal.
> See appendix B for background and comments about current AI software.
> See appendix C for some related previous efforts and discussions.
> See appendix D for comments on potential implications of this proposal.
>
> [Appendix A] https://salsa.debian.org/lumin/gr-ai-dfsg/-/blob/main/AppendixA.txt
> [Appendix B] https://salsa.debian.org/lumin/gr-ai-dfsg/-/blob/main/AppendixB.txt
> [Appendix C] https://salsa.debian.org/lumin/gr-ai-dfsg/-/blob/main/AppendixC.txt
> [Appendix D] https://salsa.debian.org/lumin/gr-ai-dfsg/-/blob/main/AppendixD.txt
>
>
> Disclaimer
> ----------
>
> We acknowledge that releasing useful AI models under permissive licenses like
> MIT/Expat and Apache-2.0 is a generous act from the original authors due to
> huge costs, and it is a great contribution to the software ecosystem and the
> society. We sincerely respect the respective authors' work.  On the other hand,
> DFSG sets a pretty high standard on software that can be included in the Debian
> distribution, which means the GR may lead to some results that not everybody
> agrees with.  Nevertheless, we appreciate your understanding of the mission of
> the Debian project -- to create a free operating system, where the "free" means
> "software freedom".

Attachment: signature.asc
Description: PGP signature


Reply to: