On Mon, May 05, 2025 at 02:13:58PM -0700, Russ Allbery wrote: > However, I am very leery about extending that exception to cases where > people are intentionally creating that situation by deleting the input > data on purpose. I agree with you on this. I do wonder however where you would place the case where the training data is available (possibly: publicly available), and the model trainers would even want to distribute it, but cannot due to unclear licensing terms. Would you say that it is a "less nasty" case than that where training data is deleted on purpose, or would you consider it as bad? FWIW, in terms of free software ethics, I consider non-open data to be "less nasty" than non-free code. That's because with code we can take the activist approach of just rewriting it under a free software license (provided enough development resources are available). With non-open data, there are cases in which you cannot just recreate and release it under a free license, no matter how many resources you have. The ability to exploit non-open-data to serve the needs of free software (as it would be the case with DFGS-free models, trained on non-DFSG-free data) is something I hesitate giving up on. Cheers -- Stefano Zacchiroli . zack@upsilon.cc . https://upsilon.cc/zack _. ^ ._ Full professor of Computer Science o o o \/|V|\/ Télécom Paris, Polytechnic Institute of Paris o o o </> <\> Co-founder & CSO Software Heritage o o o o /\|^|/\ Mastodon: https://mastodon.xyz/@zacchiro '" V "'
Attachment:
signature.asc
Description: PGP signature