[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Proposal -- Interpretation of DFSG on Artificial Intelligence (AI) Models



Hi Mo,

Please Cc me in replies as I am not subscribed.

I am aware that you have been working on this for quite some while and
have extensively collected feedback already. Thanks for taking the next
step and attempting to form project consensus.

On Sat, Apr 19, 2025 at 01:56:17PM -0400, M. Zhou wrote:
> ===============================================================================
> Proposal A: "AI models released under open source license without original
>             training data or program" are not seen as DFSG-compliant.
> ===============================================================================
> 
> The "AI models released under open source license without original training
> data or program", a particular type of files as explained above, are not seen
> as DFSG-compliant. Hence, they can not be included in the "main" section of the
> Debian archive. This proposal does not specify whether the "non-free" section
> of Debian archive can include those files.

Others have taken up some aspects already. Ian observed a bit of
vagueness and Simon also asked about how this would be applied to what
is in Debian. Ansgar and Russ identified possibly affected packages. I'd
appreciate answers to these before going to vote.

Maybe we can also approach this from a different angle. The main
approach here appears to be drawing a line using principles and turning
that into policy. How about also approaching it from practical effects?
When it comes to individual packages, many of us have an easier time
forming an opinion as to whether it should be included in Debian and
whether it should be included in Debian main. For some packages, we
disagree here, but for many we likely agree. The risk here is that we
may get lost in details.

We presently include trained networks without training data or program
for OCR, TTS, board games, and image recognition in main. For some of
those, it may be questionable whether those really should be in main,
but I guess that we mostly have consensus on including them in Debian
(with Thorsten being an exception here) being a good thing. I hope that
we find a way that enables us to upload more existing models to some
section of Debian.

My impression is that Mo's proposal attempts to clarify DFSG into a
relatively literal interpretation that thinks of training as a
compilation step, but such consideration would result in us
re-evaluating existing components and likely require us to move some
pieces from main to non-free. Practically speaking accessibility may no
longer work unless enabling non-free.

When we split off non-free-firmware from non-free, one of the big
reasons for doing it was that firmware would not typically run on the
primary CPU. To me, machine learning models are a bit similar.  Often
enough, the model architecture is DFSG-free software and it is merely
the model weights that lack "sources" in a strict DFSG interpretation.
The model weights influence the computation, but the choice of weights
typically does not allow execution of arbitrary instructions. Like
firmware, model weights are somewhat sandboxed. This kinda also applies
to non-free documentation packages or other kind of data packages. To
me, this is a significant difference.  Even if the model may be
influenced in a hostile way (something we likely cannot check when
training data or program is unavailable), it typically cannot run
arbitrary code on our computers. I would appreciate if there was a way
to tell general non-free from this more limited form apart (e.g.  using
a separate archive section). Do others agree that such a classification
of non-free would be useful?

Helmut


Reply to: