Re: Concerns regarding the "Open Source AI Definition" 1.0-RC2

To: debian-project@lists.debian.org
Subject: Re: Concerns regarding the "Open Source AI Definition" 1.0-RC2
From: Jonathan Carter <jcc@debian.org>
Date: Mon, 28 Oct 2024 21:53:31 +0200
Message-id: <[🔎] 5507b62b-6641-4c00-a451-3a749be1db35@debian.org>
In-reply-to: <[🔎] 45b12387-d23e-4c7b-bc91-6238d3ed7ed4@debian.org>
References: <[🔎] 45b12387-d23e-4c7b-bc91-6238d3ed7ed4@debian.org>

Hi Mo

On 2024/10/26 19:41, Mo Zhou wrote:

No data, no trust. No data, no security. No data, no freedom[11].
Maybe it is time for us to build a consensus on how we tell whether apiece ofAI is DFSG-compliant or not, instead of waiting for ftp-masters tointerpret those
binary blobs case-by-case.

Do we need a GR to reach a consensus?

The license that the OSI is proposing is not only inadequate, butharmful since the bad actors in the space will 100% abuse it to say"Look, it fits the OSI definition!" while causing a major regression forOpen Source, Free Software and the DFSG eco-systems.


Consider this interview with Mark Zuckerberg:
https://www.youtube.com/watch?v=Vy3OkbtUa5k

If you listen to it in isolation, it sounds great! Zuckerberg talksabout open source, what a huge impact cool open source projects likeLinux had on the world, and getting excited about what a truely opensource model this is!


Nice, right?

Except for one thing: IT'S A COMPLETE LIE, llama3 is not Open Source.

Here is the text of it's main license agreement:
https://github.com/meta-llama/llama3/blob/main/LICENSE

Section 1 already has various problems, but in particular: "You will notuse the Llama Materials or any output or results of the Llama Materialsto improve any other large language model (excluding Meta Llama 3 orderivative works thereof", from my understanding, this goes both againstthe derived works section in the DFSG and the OSD.


Section 2:

"2. Additional Commercial Terms. If, on the Meta Llama 3 version releasedate, the monthly active users of the products or services madeavailable by or for Licensee, or Licensee’s affiliates, is greater than700 million monthly active users in the preceding calendar month, youmust request a license from Meta, which Meta may grant to you in itssole discretion, and you are not authorized to exercise any of therights under this Agreement unless or until Meta otherwise expresslygrants you such rights."

I am exercising some restraint in expressing the expletives that thissection is provoking. Do I need to explain to anyone here that thisencroaches on free redistribution and field of endeavor?

I'm not going to pull the whole document apart, there's a *lot* wronghere, so I'm just going to skip ahead further where they talk about whatllama3 is not allowed to be used for.

For example, in section 2 of the AUP, they list that the model is in noway to be used for military purposes, or for the operation of criticalinfrastructure, transportation technologies or heavy machinery. Again,these are some severe restrictions that impede on fields on endeavorhere. If I have to ask for an additional license because I want to useit on heavy machinery or transportation technologies (wow, such broadfields, have you ever seen Debian on a train or bus or plain before?).

But I digress, the topic at hand is the OSI's Open Source AI Definition.However, understanding some basics of the problems with the abovelicense is important to udnerstand my objections with the definition.

The definition, firstly, speaks in broad strokes rather than specifics.As some have pointed out, the inclusion of "unshareable" data is highlyproblematic. Who decide what's unshareable? Who decides how much of theunshareable data is acceptable?

Meta's bullshit LLAMA3 license is so horrific, that it won't even makethe cut for this definition, but the problem is that it almost does, andother companies who also abuse the Open Source term will use this to themaximum extent that they can legally get away with.

This definition won't help to further AI and Open Source, it will onlybenefit the mega companies who develop the very large models so thatpeople can contribute their skills and time to improving them, butwithout contributing these models fully back to the commons. It is asetback to the values of Open Source. Some have mentioned on thedebian-ai list that they would second a GR for a statement against thisOSI AI Definition, and I would too.

The companies who build these already have some nerve training theirdata on other people's work, without consent, and even when theirlicenses already explicitly prohibit such training, but then they alsowant to restrict what you can actually use it for, and call it opensource? And then OSI makes a definition that seems carefully crafted tolet these kind of licenses slip through?


Fuck that!

-Jonathan

Attachment: OpenPGP_0xB01D1A72AC8DC9A1.asc
Description: OpenPGP public key

Attachment: OpenPGP_signature.asc
Description: OpenPGP digital signature

Reply to:

Follow-Ups:
- Re: Concerns regarding the "Open Source AI Definition" 1.0-RC2
  - From: Stefano Zacchiroli <zack@debian.org>

References:
- Concerns regarding the "Open Source AI Definition" 1.0-RC2
  - From: Mo Zhou <lumin@debian.org>

Prev by Date: Re: Concerns regarding the "Open Source AI Definition" 1.0-RC2
Next by Date: Re: Concerns regarding the "Open Source AI Definition" 1.0-RC2
Previous by thread: Re: Concerns regarding the "Open Source AI Definition" 1.0-RC2
Next by thread: Re: Concerns regarding the "Open Source AI Definition" 1.0-RC2
Index(es):
- Date
- Thread