Re: Proposal -- Interpretation of DFSG on Artificial Intelligence (AI) Models
- To: Aigars Mahinovs <aigarius@gmail.com>
- Cc: Russ Allbery <rra@debian.org>, Matthias Urlichs <matthias@urlichs.de>, debian-vote@lists.debian.org
- Subject: Re: Proposal -- Interpretation of DFSG on Artificial Intelligence (AI) Models
- From: Wouter Verhelst <wouter@debian.org>
- Date: Sun, 4 May 2025 12:35:03 +0200
- Message-id: <[🔎] aBdC1-OCYhVx3xl0@pc220518.home.grep.be>
- In-reply-to: <CABpYwDW-6rYAG5TtJvqr=v4E1aLNgwkAkeDXO+MNt56+AC0VpA@mail.gmail.com>
- References: <6a60f2f9e7e719aab39e5d21a623d8bac848b9ab.camel@debian.org> <aAfPA6IqfoDLnAhs@layer-acht.org> <40e7d297d72014365dad8be242a359c2b06ac7d3.camel@debian.org> <a351e052-ab6c-4f66-9f6c-0db8064e990c@urlichs.de> <CABpYwDUeRawmtUqjnQTYhZ5Kwt+82PFPUXZK2LN1O9GV8CSkOQ@mail.gmail.com> <87a580s0b5.fsf@hope.eyrie.org> <CABpYwDUBjmsaED7KRCscQCz9V4apZesYKeyJwpAq2UDcn6UKYQ@mail.gmail.com> <87wmb4qclf.fsf@hope.eyrie.org> <CABpYwDW-6rYAG5TtJvqr=v4E1aLNgwkAkeDXO+MNt56+AC0VpA@mail.gmail.com>
On Tue, Apr 29, 2025 at 03:17:52PM +0200, Aigars Mahinovs wrote:
> However, here we have a clear and fundamental change happening in the
> copyright law level - there is a legal break/firewall that is happening
> during training. The model *is* a derivative work of the source code of
> the training software, but is *not* a derivative work of the training
> data.
I would disagree with this statement. How is a model not a derivative
work of the training data? Wikipedia defines it as
In copyright law, a derivative work is an expressive creation that
includes major copyrightable elements of a first, previously created
original work (the underlying work). [1]
Which, as models are often able to regurgitate copyrighted works
(largely) verbatim, is to me a definition that applies to models.
[1] https://en.wikipedia.org/wiki/Derivative_work
> This means that we also have to consider what exactly is training
> data and how to deal with it, without automatically falling back to
> equating it with source code.
We have a very wide definition of "source code" in Debian. To us, source
code is not limited to software written in a common programming
language; instead, our definition considers various things such as SVG
files, libreoffice documents, gimp XCF files, etc, to be source code
too. In this context, I don't think that equating training data to
source code is too wild a thing to do.
--
w@uter.{be,co.za}
wouter@{grep.be,fosdem.org,debian.org}
I will have a Tin-Actinium-Potassium mixture, thanks.
Reply to: