Concerns regarding the "Open Source AI Definition" 1.0-RC2

To: debian-project@lists.debian.org
Subject: Concerns regarding the "Open Source AI Definition" 1.0-RC2
From: Mo Zhou <lumin@debian.org>
Date: Sat, 26 Oct 2024 10:41:12 -0700
Message-id: <[🔎] 45b12387-d23e-4c7b-bc91-6238d3ed7ed4@debian.org>

Hi folks,

While diverse issues persist, the world and the software ecosystem isstill proceedingwith the advancement of AI. As a particular type of software, AI isquite differentfrom the paradigm of traditional software, since there are morecomponents involvedas an integral parts of the AI system. People gradually realize the OpenSourceDefinition[3], derived from DFSG[4], could no longer cover AI softwarevery well.

To answer the question "what kind of AI is free software / open source",there aremultiple relevant efforts in recent years. Six years ago we discussedthe same question[6],and as a result, I drafted an unofficial document named ML-Policy[5]; Inthe recentone or two years, OSI started the drafting process of "Open Source AIDefinition" (OSAID),and its 1.0-RC2 version[1] is available for public review, and about tobe formally

released; FSF is working on a similar effort concurrently[2].

I think the upcoming of release of OSAID will make a big impact on theopen sourceecosystem. However, while OSAID starts from DFSG and the softwarefreedom definition,it is very concerning to me. Here I'll only discuss the most pressingissue -- data.

The current OSAID-1.0-RC2 only requires "data information", but not the"originaltraining data" to be available. That effectively allows "Open Source AI"to hidetheir original training datasets. A group of people expressed theirconcerns anddisagreement about the draft on OSI's forum[7][8][9][10], emphasizingthe negativeimpacts of allowing "Open Source AI" to hide their original trainingdatasets.

Allowing "Open Source AI" to hide their original training dataset isnothing differentthan setting up a dataset barrier protecting the monopoly. The "opensource community"around such "Open Source AI" is only able to conduct further developmentbased onsuch AI, but not able to inspect the process of how the original pieceof "Open

Source AI" is produced, and not able to improve the "Open Souce AI" itself.

This leads to many implications including but not limited to securityand bias issues.For instance, without being able to access the original training data ofan "OpenSource AI", once those "Open Source AI" starts to say harmful or toxicthings,or starts to deliver advertisements, nobody other than the first partyis ableto diagnose and fix the bias issue or rip the advertisement off andproduce animproved AI. In the sense of traditional open source software this looksridiculousbecause you can easily modify its source code, ripping off theadvertisement pop up

window, and re-compile it.

My mind remains mostly the same from 6 years ago. And after 5~6 years,the mostimportant concept in ML-Policy remains to be ToxicCandy, which isexactly AI released under

open source license with their training data hidden.

I felt OSI destines to draft something I disagree with some time ago.And upon therelease of OSAID-1.0, it will make a huge, irreversible impact. I couldnot convinceOSI to change their mind, but I do not want to see free softwarecommunities being

impacted by the OSAID and start to compromise software freedom.

No data, no trust. No data, no security. No data, no freedom[11].

Maybe it is time for us to build a consensus on how we tell whether apiece ofAI is DFSG-compliant or not, instead of waiting for ftp-masters tointerpret those

binary blobs case-by-case.

Do we need a GR to reach a consensus?

[1] https://opensource.org/ai/drafts/the-open-source-ai-definition-1-0-rc2

[2]https://www.fsf.org/news/fsf-is-working-on-freedom-in-machine-learning-applications

[3] https://opensource.org/osd
[4] https://www.debian.org/social_contract

[5]https://salsa.debian.org/deeplearning-team/ml-policy/-/blob/master/ML-Policy.rst

[6] https://lwn.net/Articles/760142/
[7] https://discuss.opensource.org/t/training-data-access/152

[8]https://discuss.opensource.org/t/list-of-unaddressed-issues-of-osaid-rc2/650[9]https://discuss.opensource.org/t/what-does-preferred-form-really-mean-in-open-source/625[10]https://discuss.opensource.org/t/the-open-source-ish-ai-definition-osaid/580

[11] The freedom to study, change, and improve the AI.

Reply to:

Follow-Ups:
- Re: Concerns regarding the "Open Source AI Definition" 1.0-RC2
  - From: Stefano Zacchiroli <zack@debian.org>
- Re: Concerns regarding the "Open Source AI Definition" 1.0-RC2
  - From: David Bremner <david@tethera.net>
- Re: Concerns regarding the "Open Source AI Definition" 1.0-RC2
  - From: Charles Plessy <plessy@debian.org>
- Re: Concerns regarding the "Open Source AI Definition" 1.0-RC2
  - From: Jonathan Carter <jcc@debian.org>
- Re: Concerns regarding the "Open Source AI Definition" 1.0-RC2
  - From: Gunnar Wolf <gwolf@debian.org>

Prev by Date: Re: Fw: [rt.torproject.org #335978] Feedback on Tails
Next by Date: Re: Concerns regarding the "Open Source AI Definition" 1.0-RC2
Previous by thread: Re: Fw: [rt.torproject.org #335978] Feedback on Tails
Next by thread: Re: Concerns regarding the "Open Source AI Definition" 1.0-RC2
Index(es):
- Date
- Thread