[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Proposal -- Interpretation of DFSG on Artificial Intelligence (AI) Models



===============================================================================
Brief Background, Definition, Scope, and Purpose of the Proposal
===============================================================================

AI software grows more and more popular, becoming a notable part of the
software ecosystem. This trend reveals some new questions and challenges,
especially in the interpretation of the Debian Free Software Guidelines (DFSG)
on pre-trained AI models, urging the Debian Project to revisit its
interpretation of the Debian Free Software Guidelines (DFSG) in the context of
AI software and models.

A pre-trained "AI model" is usually stored on disk in binary formats designed
for numerical arrays, as a "model checkpoint" or "state dictionary", which is
essentially a collection of matrices and vectors, holding the learned
information from the training data or simulator. When the user make use of such
file, it is usually loaded by an inference program, which performs numerical
computations to produce outputs based on the learned information in the model.
Please refer to the appendix for more background information about AI.

This proposal focuses on one interpretation of the DFSG on a particular type of
pre-trained AI models, that (1) is released under DFSG-compliant free software
licenses like MIT/Expat, Apache-2.0, etc, and satisfies any of (2) and (3)
below -- (2) is trained on data or simulator that is private, proprietary, or
inaccessible to the public; (3) does not provide the original training program.
To avoid creating new terminologies, we will refer to this type of file as "AI
models released under open source license without original training data or
program" without any abbreviation. Such models are referred to as "Open
Weights" in some circumstances (See: https://opensource.org/ai/open-weights).

The purpose of this proposal is to reach a community consensus on how we should
treat and handle the described type of AI models, which is an inevitable issue
in the future. If necessary, I can work with the Debian Policy Team to
incorporate the GR result into appropriate sections of the Debian Policy (e.g.,
in Section 10 "Files").

| Note: While nowadays people use "AI" to refer to LLMs, it is a very broad term
|   that covers much more than language models. AI models apart from language
|   models must be considered as well, such as computer vision models, audio
|   recognition models, etc.

| Note: If condition (1) is not satisfied, it is usually seen "non-free" in the
|   context of Debian community and no voting is needed. In addition, if
|   everything (including but not limited to the model itself, training data
|   training program, and inference program) is released under DFSG-compliant
|   licenses, that again needs no voting.

| Note: Traditional software parts, like a Python script or a C++ program, are
|   out of the scope of this proposal since that is a well-defined case. For
|   example, a deep learning framework or inference software written in
|   Python or C++, i.e., the program that runs the AI models, is out of the
|   scope of this proposal.


===============================================================================
Proposal A: "AI models released under open source license without original
            training data or program" are not seen as DFSG-compliant.
===============================================================================

The "AI models released under open source license without original training
data or program", a particular type of files as explained above, are not seen
as DFSG-compliant. Hence, they can not be included in the "main" section of the
Debian archive. This proposal does not specify whether the "non-free" section
of Debian archive can include those files.


-------------------------------------------------------------------------------
Appendix
-------------------------------------------------------------------------------

Inevitably there may be some terminology and/or backgrounds that is not
well-known or well-understood by the general public. Please refer to the
appendices for more information. If you cannot find relevant information to
answer your question, please consult a human professional -- or an LLM.

See appendix A for detailed rationale of this proposal.
See appendix B for background and comments about current AI software.
See appendix C for some related previous efforts and discussions.
See appendix D for comments on potential implications of this proposal.

[Appendix A] https://salsa.debian.org/lumin/gr-ai-dfsg/-/blob/main/AppendixA.txt
[Appendix B] https://salsa.debian.org/lumin/gr-ai-dfsg/-/blob/main/AppendixB.txt
[Appendix C] https://salsa.debian.org/lumin/gr-ai-dfsg/-/blob/main/AppendixC.txt
[Appendix D] https://salsa.debian.org/lumin/gr-ai-dfsg/-/blob/main/AppendixD.txt


Disclaimer
----------

We acknowledge that releasing useful AI models under permissive licenses like
MIT/Expat and Apache-2.0 is a generous act from the original authors due to
huge costs, and it is a great contribution to the software ecosystem and the
society. We sincerely respect the respective authors' work.  On the other hand,
DFSG sets a pretty high standard on software that can be included in the Debian
distribution, which means the GR may lead to some results that not everybody
agrees with.  Nevertheless, we appreciate your understanding of the mission of
the Debian project -- to create a free operating system, where the "free" means
"software freedom".

Attachment: signature.asc
Description: This is a digitally signed message part


Reply to: