Re: Proposal -- Interpretation of DFSG on Artificial Intelligence (AI) Models

To: Matthias Urlichs <matthias@urlichs.de>
Cc: debian-vote@lists.debian.org
Subject: Re: Proposal -- Interpretation of DFSG on Artificial Intelligence (AI) Models
From: Russ Allbery <rra@debian.org>
Date: Sun, 27 Apr 2025 18:47:16 -0700
Message-id: <[🔎] 878qnlqcpn.fsf@hope.eyrie.org>
In-reply-to: <[🔎] a1314e20-5796-4c3a-b8b7-e2fe8aa06f00@urlichs.de> (Matthias Urlichs's message of "Mon, 28 Apr 2025 02:49:14 +0200")
References: <[🔎] 6a60f2f9e7e719aab39e5d21a623d8bac848b9ab.camel@debian.org> <[🔎] aAgCfqRJXK1-qukG@remnant.pseudorandom.co.uk> <[🔎] 87ecxjlwzp.fsf@hope.eyrie.org> <[🔎] 20250427211010.n3qbo7ggrjehbe76@upsilon.cc> <[🔎] 87v7qpqn3i.fsf@hope.eyrie.org> <[🔎] a1314e20-5796-4c3a-b8b7-e2fe8aa06f00@urlichs.de>

Matthias Urlichs <matthias@urlichs.de> writes:

> The fact remains that our builders will be unable to reproduce the
> resulting network, for well-known practical reasons. Thus we
> mostly-have-to-trust the original publisher that their network has been
> built as documented (or even "documented" given the status of gnubg). In
> practice this is not a problem for a Backgammon engine, or even for
> Tesseract because any serious use case supports, if not requires, human
> verification of the result — but how sure can I be that a LLM intended
> for home automation doesn't contain an Open Sesame backdoor that unlocks
> my *home*'s back door?

Right, this is a known attack in the security literature with some
research behind it already. See, for example:

    https://arxiv.org/abs/2204.06974

We could get some protection by retraining the model from the base
training data and substituting our constructed model for the
upstream-provided one, but (a) that puts a lot more weight on our ability
to rebuild the model than just verification, and (b) I would not assume
it's impossible to hide backdoor construction in the training data either,
particularly if the training data is voluminous. See, for example:

    https://nisos.com/research/building-trustworthy-ai/

and that's just the first of many links that I found in a quick search.

Obviously there are a bunch of use cases for these things that will never
involve adversarial data, and not everything needs to be robust in order
to be included in Debian, but it's one of the things to be thinking about
if accepting even our current status quo position of including some ML
models in Debian main without being able to verify the model construction.

LLMs in particular are nascent techology with novel security flaws that
researchers are only starting to explore. I think the chances are high
that their security will get much, much worse before it gets better. It is
one of the many reasons why I am generally an LLM skeptic.

-- 
Russ Allbery (rra@debian.org)              <https://www.eyrie.org/~eagle/>

Reply to:

Follow-Ups:
- Re: Proposal -- Interpretation of DFSG on Artificial Intelligence (AI) Models
  - From: thomas@goirand.fr

References:
- Proposal -- Interpretation of DFSG on Artificial Intelligence (AI) Models
  - From: "M. Zhou" <lumin@debian.org>
- Re: Proposal -- Interpretation of DFSG on Artificial Intelligence (AI) Models
  - From: Simon McVittie <smcv@debian.org>
- Re: Proposal -- Interpretation of DFSG on Artificial Intelligence (AI) Models
  - From: Russ Allbery <rra@debian.org>
- Re: Proposal -- Interpretation of DFSG on Artificial Intelligence (AI) Models
  - From: Stefano Zacchiroli <zack@debian.org>
- Re: Proposal -- Interpretation of DFSG on Artificial Intelligence (AI) Models
  - From: Russ Allbery <rra@debian.org>
- Re: Proposal -- Interpretation of DFSG on Artificial Intelligence (AI) Models
  - From: Matthias Urlichs <matthias@urlichs.de>

Prev by Date: Re: Proposal -- Interpretation of DFSG on Artificial Intelligence (AI) Models
Next by Date: Re: Proposal -- Interpretation of DFSG on Artificial Intelligence (AI) Models
Previous by thread: Re: Proposal -- Interpretation of DFSG on Artificial Intelligence (AI) Models
Next by thread: Re: Proposal -- Interpretation of DFSG on Artificial Intelligence (AI) Models
Index(es):
- Date
- Thread