Re: Non-LLM example where we do not in practice use original training data
On Thu, 8 May 2025 at 20:49, Russ Allbery <rra@debian.org> wrote:
> But let me slide down the slippery slope a bit farther and present a case
> that I think is a natural extension of that position. Suppose that instead
> of training a Bayesian spam filter on a bunch of mail messages without
> explicit consent, someone instead gathered every email message that I had
> ever sent to a public mailing list and used them to train an LLM to
> impersonate me.
>
> I don't think someone should be allowed to do that without my consent.
It depends. And we already have various laws in place that are
relevant for this, in the context if a human did this. Nothing about
those laws changes if a software is used as an intermediate step to
achieve the same goals. There is still a human somewhere in the loop
that either instructed the LLM to do this or set up the infrastructure
for the LLM to be doing this automatically.
If someone uses such an LLM to commit fraud or impersonation ... it is
still fraud and impersonation and still illegal. It does not make the
software they used illegal as well. Just like downloading a movie with
Bittorent client does not make the Bittorent client illegal.
If you are a known and important politician and the LLM is used to
produce a (clearly labeled) satire of your speech, then it is a fully
legal and protected use case.
Just because something can be done cheaper or at scale with help of
automation does not make the method of automation for it to become
morally wrong. See torrent, see mass manufacturing techniques that
allow factories in China to make millions of knock-offs of known toys.
The whole copyleft movement grew from the frustration of the copyright
law restricting the freedom of the users and developer to cooperate.
It is a hack of the copyright system to use the copyright *against*
the copyright. The entire purpose of the legal framework around free
software is to *reduce* the power of copyright law in software.
Here we have a *monumental* movement in the development of both
software and the entire copyright landscape as a whole - a movement
that could, finally, permanently wound the corporate silos keeping the
lid on the boiling pot of human knowledge. We finally have a legal
tool that could finally free all that knowledge that is currently
locked behind copyright walls and make it available for everyone to
use freely and automatically. And this movement has huge financial and
social backing as well. It has real chances to succeed. And we are
*opposing* it? Why?
Let me end this with a quote: Copyright delenda est.
--
Best regards,
Aigars Mahinovs
Reply to: