Stefano Zacchiroli dijo [Tue, Oct 29, 2024 at 07:03:18AM -0400]: > (...) > I have personally fought (and lost) during the OSAID definition process > to make access to training data mandatory in the definition. So while > I'm certainly not against criticizing OSAID, we should do that for the > right reasons. > > Cheers > > PS To make Llama models OSAID-compliant Meta, in addition to (1) > changing the model license, will also have to: (2) provide "a listing > of all publicly available training data and where to obtain it", and > (3) release under DFSG-compatible terms their entire training > pipeline (currently unreleased). I don't think they will ever get > there. But if they do, these would also be good things for the world. > Not *as good* as having access to the entire training dataset, but > good nonetheless. Thank you, Stefano, for being involved in this process. I clearly recognize you stand for the right causes and courses of action... and the weight of the (want-to-be-closed) industry is just too much. You are somewhat right with the PS you direct at Jonathan. However, having a "not good, but oh-not-all-that-bad" model has not been a very successful strategy in the past. I'm thinking about all the software that surfaced ~20-25 years ago, during the years of the license proliferation boom, licensed under licenses that seemed to be free but were not really (i.e. the Micosoft Shared Source Initiative). (I cannot refrain from sharing again my Confusing Public License, from 2007: https://gwolf.org/2007/04/version-3-14-of-the-copl-released.html )
Attachment:
signature.asc
Description: PGP signature