[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Non-LLM example where we do not in practice use original training data



Stefano Zacchiroli <zack@debian.org> writes:

> Answering Russ upthread, I understand very well how such a situation
> will make us Debian people fell well, because we are not hosting it. But
> I fail to say how this helps in delivering software freedom to our
> users. First, they will have the models in question anyway, probably
> automatically so we will really not be "protecting" them from this eveil
> OSAID-but-not-DFSG-free stuff. (Or are we going to rule that free
> software that does this cannot be in main too?)

> Second, it will be more work for our maintainers, and deliver an overall
> worse experience in terms of security, mirroring, etc.

> Finally, we will also be making things harder for people that are fine
> with the limited modifications that are possible without the training
> data (e.g., fine tuning) as they will not be able to find the full
> sources (that are enough for their needs) within the Debian archive.

But these are all arguments for merging non-free, or at the very least
non-free-firmware, into main.

There have always been good arguments for that. The proprietary NVIDIA
drivers are quite important for people to be able to use their computer
properly (thankfully hopefully becoming less so over time, but
historically computers with NVIDIA graphics cards were nearly unusable
without them, and they're still quite important for a lot of computing
applications), and many of our users did not appreciate us "protecting"
them from the drivers. If our primary goal was to make the most convenient
distribution possible for our users, I think we would selectively include
the most important non-free packages in main. It would be a better and
more integrated user experience.

I don't understand why machine learning models are any different. Or,
rather, I understand why they're different to people who truly believe
they really are free software. That argument makes sense to me; I just
don't agree with it. But I don't understand the argument if one agrees
that models without training data are non-free.

Maybe the answer is that they're just too useful to the distribution to
not package regardless of our opinions about whether they're free
software. User experience and free software principles *are* often in
tension and it's fine for us to shift that balance, in my opinion. But I
guess I would have expected us to do that via a mechanism similar to
non-free-firmware if we wanted to make it easy for users to use software
that is OSAID-approved but not DFSG-free, at least if we have a lot of it.

-- 
Russ Allbery (rra@debian.org)              <https://www.eyrie.org/~eagle/>


Reply to: