The simple fact that none of the LLMs have been sued out of existence by any copyright owner is de facto proof that it does not work that way in the eyes of the judicial system.
That may or may not be correct in the long run, IANAL and all that.
However. Copyright is only one aspect of whether or not models should end up in main. Plain old reproducibility is important to us too.
If we can't include the training data, for obvious copyright
reasons, then the question whether the resulting model itself is
copyright-clean doesn't matter.
-- -- regards -- -- Matthias Urlichs
BEGIN:VCARD VERSION:4.0 N:Urlichs;Matthias;;; NICKNAME:Smurf EMAIL;PREF=1:matthias@urlichs.de TEL;TYPE=work;VALUE=TEXT:+49 911 59818 0 URL;TYPE=home:https://matthias.urlichs.de END:VCARD
Attachment:
OpenPGP_signature.asc
Description: OpenPGP digital signature