Re: Proposal -- Interpretation of DFSG on Artificial Intelligence (AI) Models

To: debian-vote@lists.debian.org
Subject: Re: Proposal -- Interpretation of DFSG on Artificial Intelligence (AI) Models
From: Stefano Zacchiroli <zack@debian.org>
Date: Fri, 16 May 2025 08:03:20 +0200
Message-id: <[🔎] 20250516060320.jbf6m6a2xebqwzkz@upsilon.cc>
In-reply-to: <[🔎] CABpYwDUZffCo4PbUEqEPkBNrhO9vyrWKNNek_zQ+9+5RivxOUA@mail.gmail.com>
References: <[🔎] aBdC1-OCYhVx3xl0@pc220518.home.grep.be> <[🔎] 3014771.o0KrE1Onz3@soren-desktop> <[🔎] CABpYwDU3LSG0LCQYhZrBTQo6PgscaUz08=jThQbOMWKtoQk4Mg@mail.gmail.com> <[🔎] 6154585.UjTJXf6HLC@soren-desktop> <[🔎] CABpYwDXA-eocgT7n9XxaT+VgUJaFWFz2BuQKvVkgfA79ftcVvA@mail.gmail.com> <[🔎] 20250515080631.ghak4ri37wpdqsvy@upsilon.cc> <[🔎] CABpYwDUZffCo4PbUEqEPkBNrhO9vyrWKNNek_zQ+9+5RivxOUA@mail.gmail.com>

On Thu, May 15, 2025 at 11:36:22AM +0200, Aigars Mahinovs wrote:
> On Thu, 15 May 2025 at 10:06, Stefano Zacchiroli <zack@debian.org> wrote:
> > But I don't think it is disputable that the *most general* way of
> > modifying an ML model is achievable only starting from the full training
> > dataset and pipeline. There are simply things that you cannot do
> > starting from the trained model.
> 
> This is not quite the point I was trying to make in this specific
> thread. I was pointing out the difference between raw blob of training
> data and pipeline that creates/gathers that raw blob of training data.
[..]
> But I do think that it should be perfectly fine to have an ingest
> pipeline that simply downloads "
> https://data.commoncrawl.org/crawl-data/CC-MAIN-2025-18/warc.paths.gz
> " for example.

Oh, I see. Thanks for clarifying, I indeed did not get that this was the
main point you were raising in this sub-thread.

FWIW, I agree that "where is it hosted?" is a less important question
wrt the one of whether the full/pristine training dataset is available,
for our users, *somewhere* in the first place. But note that if Debian
accepts not to host datasets on its own infrastructure, then a number of
practical issues arises, e.g., what do we do with the package in main
if/when the data disappears from the external hosting place? (Yes, I
know those datasets are hosted by archives, whose mission is to preserve
data in the long run, but even archives can fail, might be forced to
delete data, etc. As long as we are not in control, anything goes.)

Cheers
-- 
Stefano Zacchiroli . zack@upsilon.cc . https://upsilon.cc/zack  _. ^ ._
Full professor of Computer Science              o     o   o     \/|V|\/
Télécom Paris, Polytechnic Institute of Paris     o     o o    </>   <\>
Co-founder & CSO Software Heritage            o o o     o       /\|^|/\
Mastodon: https://mastodon.xyz/@zacchiro                        '" V "'

Attachment: signature.asc
Description: PGP signature

Reply to:

Follow-Ups:
- Re: Proposal -- Interpretation of DFSG on Artificial Intelligence (AI) Models
  - From: "Andrea Pappacoda" <tachi@debian.org>

References:
- Re: Proposal -- Interpretation of DFSG on Artificial Intelligence (AI) Models
  - From: Wouter Verhelst <wouter@debian.org>
- Re: Proposal -- Interpretation of DFSG on Artificial Intelligence (AI) Models
  - From: Soren Stoutner <soren@debian.org>
- Re: Proposal -- Interpretation of DFSG on Artificial Intelligence (AI) Models
  - From: Aigars Mahinovs <aigarius@gmail.com>
- Re: Proposal -- Interpretation of DFSG on Artificial Intelligence (AI) Models
  - From: Soren Stoutner <soren@debian.org>
- Re: Proposal -- Interpretation of DFSG on Artificial Intelligence (AI) Models
  - From: Aigars Mahinovs <aigarius@gmail.com>
- Re: Proposal -- Interpretation of DFSG on Artificial Intelligence (AI) Models
  - From: Stefano Zacchiroli <zack@debian.org>
- Re: Proposal -- Interpretation of DFSG on Artificial Intelligence (AI) Models
  - From: Aigars Mahinovs <aigarius@gmail.com>

Prev by Date: Re: Proposal -- Interpretation of DFSG on Artificial Intelligence (AI) Models
Next by Date: Can we calm the tone here (and other lists), please?
Previous by thread: Re: Proposal -- Interpretation of DFSG on Artificial Intelligence (AI) Models
Next by thread: Re: Proposal -- Interpretation of DFSG on Artificial Intelligence (AI) Models
Index(es):
- Date
- Thread