On Fri, 2025-07-25 at 17:44 +0900, Charles Plessy wrote: > Imaging that one day a Free software projects wants to train an entirely Free > LLM, that among others knows well about Debian, and for which the ouptuts are > guaranteed to be free from copyright violations. If that would have a chance > to happen, wouldn't it be better that our wiki's contents are under a more > permissive license that does not require attribution? - From what I know, the only way this would be achieved is putting the content in the public domain or an equivalent like the CC0. - I don't think contributors want their work to be used without attribution. - Virtually no content from other sources/sites could be copied to the wiki because nothing is compatible with public domain except other public domain content. [ What follows are my thoughts / brain-dump. ] For me, it's not that I'm against the use of wiki content as training data for AI, but the lack of attribution is a problem, especially when AI companies claim that such use of content without attribution is "fair use", then advertise that the material produced by their algorithms is original work. It *is* possible for an AI algorithm to cite sources and comply with licensing. Such technology can and is being developed in the wild. For example, Google Gemini provides links for where it got it's information; it's not perfect, but it's a start. If producing content and crediting the authors of that content is too difficult to achieve, then how about only produce links referring to content -- so an improved search engine that hands you the answers to your questions on a silver platter without reproducing it outright. (This could also address the issue of publishers & news sites losing revenue due to AI summarizing and reproducing their content?) I don't think work written by people who have dedicated their own time to producing content on the wiki should have their work used without attribution. -- Maytham
Attachment:
signature.asc
Description: This is a digitally signed message part