I think many of us modify machine learning models on a regular basis. And I think when we make those modifications, we do not go back to original training data, but instead, we modify the model weights. I suspect I am not the only one who uses rspamd and who uses both the Bayesian classifier and the neural network classifier, both of which are machine learning models. My point here is that there a common case where the preferred form of modification for a model definitely is not the original training data. Some people on the list probably do retain all the messages they submit for learning. I know I do not. (I retain a significant subset and probably could reproduce something if I had to.) If I wanted to package up my classifier state and distribute it under a free software license, I think it should be DFSG free. I think that to satisfy the DFSG I would need to include all the training data I still had and any scripts I used. But I think in that circumstance the model weights would be a reasonable preferred form of modification. If the way I responded to bug reports was to manually run messages through rspamc, I think that ought to be DFSG free based on decisions we have made in similar circumstances in the past. I appreciate that coming up with a classifier state that was generic enough to be valuable to package in Debian would be difficult. However, I think this serves as an example we can all get our heads around to see that in practice, real users do often use model weights as the preferred form of modification.
Attachment:
signature.asc
Description: PGP signature