[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Non-LLM example where we do not in practice use original training data



On Thu, May 08, 2025 at 05:38:52PM +0300, Stefano Zacchiroli wrote:
What I strongly suspect would happen, if proposal A wins (which I also
consider quite likely) is that Debian maintainers of free software
products that use trained ML models that lack DFSG-free training data,
will have to go down the rabbit hole of patching those software to
systematically download the models on first use. Or just give up on
maintaining those packages, of course.

That seems like widespread failure to me, but I'm still hoping
that someone who supports either Mo's or Thorsten's proposal will
articulate a better vision.

On Thu, May 08, 2025 at 09:42:25AM -0700, Russ Allbery wrote:
I don't understand why machine learning models are any different. Or,
rather, I understand why they're different to people who truly believe
they really are free software. That argument makes sense to me; I just
don't agree with it. But I don't understand the argument if one agrees
that models without training data are non-free.

I'm not sure that these are quite the right terms.  This email itself
is non-free software, but if Sam wants to train some kind of deep
learning model on it and release the model, without training data,
under the Expat license, I definitely would not refer to the model
as non-free.  Would I prefer that copyright law be abolished and
there be no impediments to providing the training data as well?
Of course I would.  But, absent that, there would be no way for Sam
to distribute the training data as free software.

To free some non-free firmware, in theory, the copyright holders
just need to be motivated enough to do it.  To free Sam's
hypothetical email corpus, you would have to convince every single
email author, including the spammers, to relicense.  One of them
is more of a pipe dream than the other.

Maybe the answer is that they're just too useful to the distribution to
not package regardless of our opinions about whether they're free
software. User experience and free software principles *are* often in
tension and it's fine for us to shift that balance, in my opinion. But I
guess I would have expected us to do that via a mechanism similar to
non-free-firmware if we wanted to make it easy for users to use software
that is OSAID-approved but not DFSG-free, at least if we have a lot of it.

Maybe that is what we should be doing; I'm not sure.


Reply to: