Re: CC BY-SA and AI training?

To: Charles Plessy <plessy@debian.org>
Cc: debian-wiki@lists.debian.org
Subject: Re: CC BY-SA and AI training?
From: Maytham Alsudany <maytham@debian.org>
Date: Fri, 25 Jul 2025 17:49:14 +0800
Message-id: <[🔎] 82a23f926eeb5ac230ff0d60222c653c1c379e15.camel@debian.org>
In-reply-to: <[🔎] aINEB9ZJPynS8pLt@kumo.plessy.net>
References: <[🔎] aINEB9ZJPynS8pLt@kumo.plessy.net>

On Fri, 2025-07-25 at 17:44 +0900, Charles Plessy wrote:
> Imaging that one day a Free software projects wants to train an entirely Free
> LLM, that among others knows well about Debian, and for which the ouptuts are
> guaranteed to be free from copyright violations.  If that would have a chance
> to happen, wouldn't it be better that our wiki's contents are under a more
> permissive license that does not require attribution?

- From what I know, the only way this would be achieved is putting the
content in the public domain or an equivalent like the CC0.
- I don't think contributors want their work to be used without
attribution.
- Virtually no content from other sources/sites could be copied to the
wiki because nothing is compatible with public domain except other
public domain content.

[ What follows are my thoughts / brain-dump. ]

For me, it's not that I'm against the use of wiki content as training
data for AI, but the lack of attribution is a problem, especially when
AI companies claim that such use of content without attribution is "fair
use", then advertise that the material produced by their algorithms is
original work.

It *is* possible for an AI algorithm to cite sources and comply with
licensing. Such technology can and is being developed in the wild. For
example, Google Gemini provides links for where it got it's information;
it's not perfect, but it's a start.

If producing content and crediting the authors of that content is too
difficult to achieve, then how about only produce links referring to
content -- so an improved search engine that hands you the answers to
your questions on a silver platter without reproducing it outright.
(This could also address the issue of publishers & news sites losing
revenue due to AI summarizing and reproducing their content?)

I don't think work written by people who have dedicated their own time
to producing content on the wiki should have their work used without
attribution.

--
Maytham

Attachment: signature.asc
Description: This is a digitally signed message part

Reply to:

Follow-Ups:
- Re: CC BY-SA and AI training?
  - From: Maytham Alsudany <maytham@debian.org>

References:
- CC BY-SA and AI training?
  - From: Charles Plessy <plessy@debian.org>

Prev by Date: CC BY-SA and AI training?
Next by Date: Complex conversion issues
Previous by thread: CC BY-SA and AI training?
Next by thread: Re: CC BY-SA and AI training?
Index(es):
- Date
- Thread