[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Proposal -- Interpretation of DFSG on Artificial Intelligence (AI) Models



On Wed, May 14, 2025 at 11:38:02PM +0200, Aigars Mahinovs wrote:
> You would *actually* technically, in reality, prefer digging through
> gigabytes of text files and do some kind of manual modifications in
> that sea of raw data? Modifications that are basically impossible to
> track in any kind of change tracker. That are excessively hard and
> time consuming to actually do and check. Instead of just adjusting
> input parameters on the ingest script? *That* is what I consider to be
> frankly very hard to believe.

Aigars, I'm sympathetic to your general stance in this debate, but I
think you push it too far, in the following sense.

It is undeniable that *some* modifications of a trained ML models are
possible starting directly from the model weights. I also personally
agree that, at least for big models, *most* modifications (counted in
terms of use cases and/or users actually doing them) will happen
starting from model weights via techniques like fine tuning.

But I don't think it is disputable that the *most general* way of
modifying an ML model is achievable only starting from the full training
dataset and pipeline. There are simply things that you cannot do
starting from the trained model. You are right that, for big models at
least, it will be unpractical to do those changes, and that most actors
(including Debian) will not have the resources to do the re-training.
But that should not lead us to equate the scenario in which training
data is available to that in which it isn't. I think our debate in
Debian should be about where do we put the bar of what is *required* to
be in main, without dismissing the fact that it *is* better to have
training data than not having it.

Do you agree with the above?

Cheers
-- 
Stefano Zacchiroli . zack@upsilon.cc . https://upsilon.cc/zack  _. ^ ._
Full professor of Computer Science              o     o   o     \/|V|\/
Télécom Paris, Polytechnic Institute of Paris     o     o o    </>   <\>
Co-founder & CSO Software Heritage            o o o     o       /\|^|/\
Mastodon: https://mastodon.xyz/@zacchiro                        '" V "'

Attachment: signature.asc
Description: PGP signature


Reply to: