Re: Proposal -- Interpretation of DFSG on Artificial Intelligence (AI) Models

On Mon, 5 May 2025 at 01:05, Wouter Verhelst <wouter@debian.org> wrote:

On Sun, May 04, 2025 at 07:08:00PM +0200, Aigars Mahinovs wrote:
> On Sun, 4 May 2025 at 17:30, Wouter Verhelst <[1]w@uter.be> wrote:
>
> > Wikipedia definition is a layman's simplification.
> It may be a simplification, but that in and of itself does not make
> it
> incorrect.
>
> I have specifically addressed this point with examples in my reply.
> Copyright very clearly does not survive learning and then generation of
> new solutions. In humans that is a given.

Indeed.

> For software I would assume the equivalence, unless proven
> differently.

This is not a fact; this is your opinion. You base the rest of your
argument on it, so I'll call it an axiom: something to accept in order
for the rest of the argument to hold.

The problem is, I disagree with your axiom.

To me, software and humans are two very different things. We know how
computers work; we can therefore reason what the output of a software
program is going to be based on the input that you give it. Whether that
program is a compiler or a trainer program for a deep learning model is
just a detail in that context. One computer chip of a given model and
stepping is 100% equivalent to another, and so any process that runs on
one of these chips will produce the same output on another.

The same is not true for human brains; we do not fully understand how
they work, we cannot predict what the resulting experience of a given
person is going to render based on the training that person has
received, and therefore we cannot predict how a given person is going to
write a particular piece of software.

Theoretically we could predict both, if we knew all the inputs and all the algorithms. We just practically do not know that for most humans. Following this logic to the extreme would make all human-written software to be non-free, because we can not reproduce the training inputs and the model software itself has no known source code. /s

But we don't need to go as far because the copyright law fair use does not require the transformation to be non-deterministic to be transformative. Or carried out by a human.

A simple, very deterministic and clearly fully automatic and software-only creation of thumbnails for search engines is ruled as fair use. So is very simple full text indexing of books by Google Books and then providing direct snipplers of those materials to the end users.

The transformative criteria here is that the resulting work needs to be transformed in such a way that it adds value. And generating new texts from a LLM is pretty clearly a value-adding transformation compared to the original articles. Even more so than the already ruled-on Google Books case.

Best regards,
Aigars Mahinovs