Re: Proposal -- Interpretation of DFSG on Artificial Intelligence (AI) Models
On Fri, May 09, 2025 at 12:11:25PM +0200, Aigars Mahinovs wrote:
> On Thu, 8 May 2025 at 12:46, Wouter Verhelst <wouter@debian.org> wrote:
> >
> > On Tue, May 06, 2025 at 12:02:08AM +0200, Aigars Mahinovs wrote:
> > > The transformative criteria here is that the resulting work needs to be
> > > transformed in such a way that it adds value. And generating new texts
> > > from a LLM is pretty clearly a value-adding transformation compared to
> > > the original articles. Even more so than the already ruled-on Google
> > > Books case.
> >
> > OK, let me change it around a bit, because I don't think this discussion
> > is going in any direction that is relevant for Debian.
> >
> > The only way in which you can build a model is by taking loads and loads
> > of data, running some piece of software over it, and storing the result
> > somewhere.
> >
> > How can we do this legally, reproducibly, and openly if we do not have
> > the rights to redistribute the said "loads and loads of data"?
> >
> > The answer is, we can't.
>
> Sure we can. It is a technical problem, actually. As long as the data
> is still available, you can store and redistribute information about
> which data you gathered, from where and how it looked like - hashes of
> copyrigthed content are not copyrighted ;)
The fact that we don't need to do something technically doesn't mean
it's not a good idea.
We don't *need to* distribute source packages to build software, but we
choose to anyway.
We don't *need to* distribute the latex, docbook, or libreoffice sources
for PDF documentation in our packages, but we choose to anyway.
In a similar vein, yes, you're right that technically we don't need to
distribute the input data for the models, but that doesn't make it a bad
idea.
I mean. Honestly. If you're going to use "the law" as an argument one
more time, I'm going to *scream*.
I shouldn't even have to explain this to you; "the law" has no bearing
on the difference between "main" and "non-free". Yes, the decision on
whether something can go into our non-free repository is purely and
simply "is it legal for us to put that there". If the answer to that
question is "yes", then it can, and if the answer is "no", then it
can't. It's as simple as that.
But for our main repository, the story is different. So even if "the
law" states that something is fine and legal to do, that doesn't mean we
*have* to state that it therefore automatically satisfies *our*
standards of "free software".
This is what I'm trying to say, and you're not going to convince me that
something can go into main because of any argument that is based on "the
law".
In my opinion, a model is not free if we don't have the rights to build
that model, and if we don't have the rights to redistribute everything
that is needed to build that model. Anything else fails DFSG1, DFSG2,
DFSG7 and DFSG8, and it *does not matter* whether copyrights attached to
those files transfer to the model or not.
--
w@uter.{be,co.za}
wouter@{grep.be,fosdem.org,debian.org}
I will have a Tin-Actinium-Potassium mixture, thanks.
Reply to: