Re: Brief update about software freedom and artificial intelligence
"Roberto A. Foglietta" <email@example.com> writes:
> - then I decided to protect my projects repositories as database
> (collection) in addition to the standard way to protect the code with
> a well-known license
> - because of the copyright law about databases, if someone creates a
> larger database that contains my database or a part of it, then they
> have to comply with the license that I choose to protect my project as
> a database.
In the United States, this is only true if (a) the collection is
copyrightable (let's presume that's true in this case), and (b) their use
of your collection is not fair use. If their use of your collection is
fair use, then they do not have to comply with your license.
In other countries, I have no idea. Presumably there is a similar set of
rules under the same or different terms to allow such things as parodies,
but the boundaries may be different and I know very little about how those
rules have been applied to software outside of the US. My understanding
is the Berne Convention doesn't standardize the rules around fair use
(under whatever name), so this can differ a lot by jurisdiction.
> You see, it is a very simple and straightforward concept. The only two
> ways to get off this are 1. make unlawful the database copyright law,
> 2. make a law for which the training input collection is not coverable
> by the copyright law. In both cases every employer can bring to their
> home a copy of a database or a copy of AI training inputs and share it
> with all the rest of the world. Moreover, the 1. includes the 2 while
> the 2. would seriously undermine the database copyright law because
> every database could be a training set for an AI/ML engine.
> Russ, do you agree? :-)
No. It's entirely possible that using databases as training sets for an
AI/ML engine is fair use under existing United States law and precedent as
long as that use is sufficiently transformative (the first factor of the
test, and I suspect the most important one here). The obvious example is
a search engine, which performs a similar transformation of clearly
copyrighted works into a new service with a different purpose, without the
explicit permission of the copyright holders.
This is the reason why people have focused so much on GitHub Copilot's
willingness to insert large blocks of code from other projects verbatim.
Reproducing code from other projects is less transformative and looks more
like simple copying, and therefore opens GitHub to a legal argument that
their AI model is not sufficiently transformative to be fair use.
Russ Allbery (firstname.lastname@example.org) <https://www.eyrie.org/~eagle/>