Re: A policy on use of AI-generated content in Debian

To: debian-project@lists.debian.org
Subject: Re: A policy on use of AI-generated content in Debian
From: Tiago Bortoletto Vaz <tiago@debian.org>
Date: Fri, 3 May 2024 12:35:21 -0400
Message-id: <[🔎] v47buqhqff5ryecf3y4uqbxnphgarhxa5jdtcvouvhpgtsc7xy@234jfqbyyx4e>
In-reply-to: <[🔎] D102CZBNFZ3K.12IKY5O91NIRS@debian.org>
References: <[🔎] 3qxsesyoouxh2h6fodosnln4wsyl3tpmnbcu6pqzekqkz6k577@a2gos5jbaowf> <[🔎] 874jbgghvl.fsf@hope.eyrie.org> <[🔎] hf3fe7mgdyzok7yjtyt6qaheor3bbzm3ffowk3vhxuge4fimbr@dtkn7ovo4nrr> <[🔎] D102CZBNFZ3K.12IKY5O91NIRS@debian.org>

Hi Jose,

Thanks for you input, I have a few comments:

On Fri, May 03, 2024 at 11:02:47AM -0300, Jose-Luis Rivas wrote:
> On Thu May 2, 2024 at 9:21 PM -03, Tiago Bortoletto Vaz wrote:
> > Right, note that they acknowledged this policy is a working in progress. Not
> > perfect, but 'something needed to be done, quickly'. It's hard to find a
> > balance here, but I kind of share this sense of urgency.
> >
> > [...]
> >
> > This point resonates with problems we might be facing already, for instance
> > in the NM process and also in Debconf submissions (there's no point of going
> > into details here because so far we can't proof anything, and even if we could,
> > of course we wouldn't bring any of the involved to the public arena). So I'm
> > actually more concerned about LLM being mindlessly applied in our communication
> > processes (NM, bts, debconf, irc, planet, wiki, website, debian.net stuff, etc)
> > than one using some AI-assisted code in our infra, at least for now.
> >
> 
> Hi Tiago,
> 
> It seems you have more context than the rest which provides a sense of
> urgency for you, where others do not have this same information and
> can't share this sense of urgency.

Yes.

> If I were to assume based on the little context you shared, I would say
> there's someone doing a NM application using LLM, answering stuff with
> LLM and passing all their communications through LLMs.
> 
> In that case, there's even less point in making a policy about it, in my
> opinion. Since as you stated: you can't prove anything, and ultimately
> it would land in the hands of the people approving submissions or NMs to
> judge if the person is qualified or not. And you can't block
> communications from LLM generated content when you can't even prove it's
> LLM generated content. How to enforce it?

Hmm I tend to disagree here. Proving by investigation isn't the only way to get
some truth about the situation. We can get it by simply asking the person if
they used LLM to generate their work (be it an answer to NM questions, or a
contribution to Debian website, or an email to this mailing list...). In that
scenario, having a policy, a position statement or even a gentle guideline
would make a huge difference in the ongoing exchange.

> And I doubt a statement would do much, as well. What would be
> communicated? "Communications produced by LLMs are troublesome"? I don't
> know if there's much substance to have a statement of that sort.

Just to set the scene a little on how I think about the issue: when I brought
up this discussion, I didn't have in mind someone evil attempting to use AI to
deliberately disrupt the project. We know already that policies or statements
are never sufficient to deal with people in this category. Rather, I see many
people (mostly younger contributors) who're getting to use LLMs in their daily
life in a quite mindless way -- which of course is not our business if they do
so in their private life. However, the issues that can arise using this kind of
technology without much consideration in a community like Debian are not
obvious to everyone, and I don't expect every Debian contributor to have a
sufficiently good understanding of the matter, or maturity, at the moment they
start contributing to the project. We can draw some analogy here in relation to
the CoC and the Diversity Statement. They might seem quite obvious to some, and
less so to others.

So far I've felt a certain resistance to adopting something as sharp as Gentoo
did (which I've already agreed with). However, I still have the feeling that a
position in the form of a statement or even a guideline could help us both
avoid and mitigate possible problems in the future.

Bests,

--
tvaz

Reply to:

Follow-Ups:
- Re: A policy on use of AI-generated content in Debian
  - From: Sam Hartman <hartmans@debian.org>

References:
- A policy on use of AI-generated content in Debian
  - From: Tiago Bortoletto Vaz <tiago@debian.org>
- Re: A policy on use of AI-generated content in Debian
  - From: Russ Allbery <rra@debian.org>
- Re: A policy on use of AI-generated content in Debian
  - From: Tiago Bortoletto Vaz <tiago@debian.org>
- Re: A policy on use of AI-generated content in Debian
  - From: "Jose-Luis Rivas" <ghostbar@debian.org>

Prev by Date: Re: A policy on use of AI-generated content in Debian
Next by Date: Re: A policy on use of AI-generated content in Debian
Previous by thread: Re: A policy on use of AI-generated content in Debian
Next by thread: Re: A policy on use of AI-generated content in Debian
Index(es):
- Date
- Thread