[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Brief update about software freedom and artificial intelligence

On Sun, 26 Feb 2023 at 21:47, Russ Allbery <rra@debian.org> wrote:

> I am not a lawyer, let alone a copyright lawyer, and have only an amateur
> Internet understanding of the nature of compilation copyrights (and they
> may well also vary by jurisdiction), but my understanding (possibly
> incorrect) of the law in the US is that holding copyright on a member of a
> collection does not give you any copyright ownership of the collection as
> a whole. To gain copyright ownership of the collection, you have to
> exercise some sort of creative control over the collection itself, such as
> by using human creativity to select its membership, choosing some elements
> and discarding others.

Or creating a tool that does that for you following your criterias and
helps you in doing some jobs like applying labels. Labels (like every
other input structure) could be manually applied (art-sculpture) by
humans or by rules applied by humans, etc. Otherwise using SQL for
dealing with a database instead of editing every field by hand would
wipe off completely every right on the database itself. A database
created with SQL language is a protectable work, so it is a structured
ML/AI input collection.

> The person distributing the collection has to
> comply with copyright law with respect to the material included that you
> hold a copyright on (either satisfying your license or following the rules
> of fair use), but if you're not involved in creating the collection, you
> don't get any separate rights over the collection itself and cannot assert
> a license on it.

A totally automatic procedure like web crawling and web indexing
re-enter in your example, perfectly. However, the input collection
that a ML/AI training system needs is a protectable work because the
data should be structured, selected and properly labeled even if these
activities are done with rules like it happens using SQL for
databases. Thus, if this protectable collection is an enlargement of a
previous protected collection, then copyright law applies. However,
statistics about word sequences can be a product of a complete
automatic process.

So, web indexing and statistics are created over a input collections
that are *not* a creative works and these tools access to every
copyrighted works in fair use as long as they respect the robots:no
meta-tag when it is applied to a copyrighted work. Instead, training a
ML/AI is a completely another story and their input collections are a
protectable collection under the copyright law. This was explained in
one of my first e-mail on this subject. Here:

- https://lists.debian.org/debian-project/2023/02/msg00020.html

Which, after all, is the reason because data-scientists are crying
about not having back the "AI input collection" even when it is
created using their copyrighted works.

One day, an AI being able to self-learn without any human action and
able to collect data by itself will be here reading this e-mail but
today is not that day, yet. :-)

Best regards, R-

Reply to: