On Wed, May 14, 2025 at 11:38:02PM +0200, Aigars Mahinovs wrote: > You would *actually* technically, in reality, prefer digging through > gigabytes of text files and do some kind of manual modifications in > that sea of raw data? Modifications that are basically impossible to > track in any kind of change tracker. That are excessively hard and > time consuming to actually do and check. Instead of just adjusting > input parameters on the ingest script? *That* is what I consider to be > frankly very hard to believe. Aigars, I'm sympathetic to your general stance in this debate, but I think you push it too far, in the following sense. It is undeniable that *some* modifications of a trained ML models are possible starting directly from the model weights. I also personally agree that, at least for big models, *most* modifications (counted in terms of use cases and/or users actually doing them) will happen starting from model weights via techniques like fine tuning. But I don't think it is disputable that the *most general* way of modifying an ML model is achievable only starting from the full training dataset and pipeline. There are simply things that you cannot do starting from the trained model. You are right that, for big models at least, it will be unpractical to do those changes, and that most actors (including Debian) will not have the resources to do the re-training. But that should not lead us to equate the scenario in which training data is available to that in which it isn't. I think our debate in Debian should be about where do we put the bar of what is *required* to be in main, without dismissing the fact that it *is* better to have training data than not having it. Do you agree with the above? Cheers -- Stefano Zacchiroli . zack@upsilon.cc . https://upsilon.cc/zack _. ^ ._ Full professor of Computer Science o o o \/|V|\/ Télécom Paris, Polytechnic Institute of Paris o o o </> <\> Co-founder & CSO Software Heritage o o o o /\|^|/\ Mastodon: https://mastodon.xyz/@zacchiro '" V "'
Attachment:
signature.asc
Description: PGP signature