Re: [RFCv3] Counter-Proposal -- Interpretation of DFSG on Artificial Intelligence (AI) Models
On Sat, 10 May 2025, Aigars Mahinovs wrote:
>An algorithm that only stores and produces an *average* value across a
>wide set of inputs can not be any kind of compression.
It’s not “just” an average: as has been shown, substantial amounts of
substantially unmodified “training data” can be extracted.
>It is data mining.
The copyright exception for text and data mining is only valid for
uses that extract trends and things like that, not for generative
use (and not for content with explicit opt-out, which those scrapers
ignored).
>then go up. If I run "wc" on a copyrighted work, the number of words
>in the document is *not* a derived work from the original document.
If you JPEG-compress a photo of the original document then uncompress
it, it *is*.
And, again, this has been shown to be substantially significantly for
these models to be possible, therefore we need to act as if the output
of such generation is derived from its inputs in the general case.
There will always be outputs which aren’t, and inputs which don’t
influence a subset of particular outputs, but the sum of its outputs
is mechanicall derived from (most of) the sum of its inputs.
bye,
//mirabilos
--
/⁀\ The UTF-8 Ribbon
╲ ╱ Campaign against
╳ HTML eMail! Also,
╱ ╲ header encryption!
Reply to: