Re: Re: Proposal -- Interpretation of DFSG on Artificial Intelligence (AI) Models

On Mon, 28 Apr 2025 at 17:46, Stephan Verbücheln <verbuecheln@posteo.de> wrote:

> Is the change technical or legal/philosophical? You could call this
> a Turing test for copyright.
This is not a new issue at all. I remember that back in the day in
order to legally reverse engineer a computer program, companies had to
set up two separate teams of developers.
One team reads the code and writes documentation. The second team reads
the documentation and writes the new code. It was crucial that no
member of the second team sees the original code in order to rule out
any copyright issues.

But, does it? If we consider the product of trained knowledge to be a derivative work of the training input, then the documentation produced by the first team would also be tainted by the copyright of the original code. So such interpretation also defeats the whole two-teams process.

And many modern LLMs are actually often trained in stages - there is a very large model that is trained on the source data and then there are compact models that are actually trained by the first model. It's called model distillation.

And then there are other methods of getting new information into already trained models at runtime via RAG technique - with that a LLM may only contain fundamental information and then reach out to load additional data sources, relevant to the specific query. Like an expert going online and checking prices and availability of various products before advising you what to choose for your planned build. At this point the LLM+RAG is just a smart web browser.

(Sadly, I am *not* an expert on modern AI technologies)

Best regards,
Aigars Mahinovs