[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Non-LLM example where we do not in practice use original training data



Aigars Mahinovs <aigarius@gmail.com> writes:

> On Wed, 7 May 2025 at 02:56, Russ Allbery <rra@debian.org> wrote:
>
>>
>> I think if any of the options in the current GR except Aigars's (and maybe
>> Sam's?) passes, that would effectively be a change in our current policy,
>> even if the current policy is not precisely intentional.
>
>
> IMHO my option will also be a change in our current policy, but, instead of
> requiring the training data itself, my option would just require adding a
> documentation section describing how to create/gather and process data
> required to train such models *if* someone would want to reproduce them.

Would failure for anyone else to be able to reproduce them be a RC bug?

Do the tools required for reproducing the model have to be in Debian
main, or are non-free or external proprietary tools okay?

Do the toolchain for LLM models support bit-by-bit reproducible outputs?

Is a Build-Depends on such a LLM-model acceptable?  Then we could
eventually replace the source code for `sudo` in Debian with a LLM
prompt like "write me a secure replacement for sudo and output a
executable ELF binary for my host architecture".  In fact, with a bit of
more irony, we could replace a lot of insecure source code this way.

I'm not convinced this approach leads to something desirable.  I fear it
means people will have yet another way to add proprietary content into
Debian, and that Debian give up further on caring about user freedom.
But this is already the case, so I feel at a loss to use how to use this
argument.

/Simon

Attachment: signature.asc
Description: PGP signature


Reply to: