[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Concerns regarding the "Open Source AI Definition" 1.0-RC2



Hi Mo

On 2024/10/26 19:41, Mo Zhou wrote:
No data, no trust. No data, no security. No data, no freedom[11].

Maybe it is time for us to build a consensus on how we tell whether a piece of AI is DFSG-compliant or not, instead of waiting for ftp-masters to interpret those
binary blobs case-by-case.

Do we need a GR to reach a consensus?

The license that the OSI is proposing is not only inadequate, but harmful since the bad actors in the space will 100% abuse it to say "Look, it fits the OSI definition!" while causing a major regression for Open Source, Free Software and the DFSG eco-systems.

Consider this interview with Mark Zuckerberg:
https://www.youtube.com/watch?v=Vy3OkbtUa5k

If you listen to it in isolation, it sounds great! Zuckerberg talks about open source, what a huge impact cool open source projects like Linux had on the world, and getting excited about what a truely open source model this is!

Nice, right?

Except for one thing: IT'S A COMPLETE LIE, llama3 is not Open Source.

Here is the text of it's main license agreement:
https://github.com/meta-llama/llama3/blob/main/LICENSE

Section 1 already has various problems, but in particular: "You will not use the Llama Materials or any output or results of the Llama Materials to improve any other large language model (excluding Meta Llama 3 or derivative works thereof", from my understanding, this goes both against the derived works section in the DFSG and the OSD.

Section 2:

"2. Additional Commercial Terms. If, on the Meta Llama 3 version release date, the monthly active users of the products or services made available by or for Licensee, or Licensee’s affiliates, is greater than 700 million monthly active users in the preceding calendar month, you must request a license from Meta, which Meta may grant to you in its sole discretion, and you are not authorized to exercise any of the rights under this Agreement unless or until Meta otherwise expressly grants you such rights."

I am exercising some restraint in expressing the expletives that this section is provoking. Do I need to explain to anyone here that this encroaches on free redistribution and field of endeavor?

I'm not going to pull the whole document apart, there's a *lot* wrong here, so I'm just going to skip ahead further where they talk about what llama3 is not allowed to be used for.

For example, in section 2 of the AUP, they list that the model is in no way to be used for military purposes, or for the operation of critical infrastructure, transportation technologies or heavy machinery. Again, these are some severe restrictions that impede on fields on endeavor here. If I have to ask for an additional license because I want to use it on heavy machinery or transportation technologies (wow, such broad fields, have you ever seen Debian on a train or bus or plain before?).

But I digress, the topic at hand is the OSI's Open Source AI Definition. However, understanding some basics of the problems with the above license is important to udnerstand my objections with the definition.

The definition, firstly, speaks in broad strokes rather than specifics. As some have pointed out, the inclusion of "unshareable" data is highly problematic. Who decide what's unshareable? Who decides how much of the unshareable data is acceptable?

Meta's bullshit LLAMA3 license is so horrific, that it won't even make the cut for this definition, but the problem is that it almost does, and other companies who also abuse the Open Source term will use this to the maximum extent that they can legally get away with.

This definition won't help to further AI and Open Source, it will only benefit the mega companies who develop the very large models so that people can contribute their skills and time to improving them, but without contributing these models fully back to the commons. It is a setback to the values of Open Source. Some have mentioned on the debian-ai list that they would second a GR for a statement against this OSI AI Definition, and I would too.

The companies who build these already have some nerve training their data on other people's work, without consent, and even when their licenses already explicitly prohibit such training, but then they also want to restrict what you can actually use it for, and call it open source? And then OSI makes a definition that seems carefully crafted to let these kind of licenses slip through?

Fuck that!

-Jonathan

Attachment: OpenPGP_0xB01D1A72AC8DC9A1.asc
Description: OpenPGP public key

Attachment: OpenPGP_signature.asc
Description: OpenPGP digital signature


Reply to: