[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Concerns regarding the "Open Source AI Definition" 1.0-RC2



Hi folks,

While diverse issues persist, the world and the software ecosystem is still proceeding with the advancement of AI. As a particular type of software, AI is quite different from the paradigm of traditional software, since there are more components involved as an integral parts of the AI system. People gradually realize the Open Source Definition[3], derived from DFSG[4], could no longer cover AI software very well.

To answer the question "what kind of AI is free software / open source", there are multiple relevant efforts in recent years. Six years ago we discussed the same question[6], and as a result, I drafted an unofficial document named ML-Policy[5]; In the recent one or two years, OSI started the drafting process of "Open Source AI Definition" (OSAID), and its 1.0-RC2 version[1] is available for public review, and about to be formally
released; FSF is working on a similar effort concurrently[2].

I think the upcoming of release of OSAID will make a big impact on the open source ecosystem. However, while OSAID starts from DFSG and the software freedom definition, it is very concerning to me. Here I'll only discuss the most pressing issue -- data.

The current OSAID-1.0-RC2 only requires "data information", but not the "original training data" to be available. That effectively allows "Open Source AI" to hide their original training datasets. A group of people expressed their concerns and disagreement about the draft on OSI's forum[7][8][9][10], emphasizing the negative impacts of allowing "Open Source AI" to hide their original training datasets.

Allowing "Open Source AI" to hide their original training dataset is nothing different than setting up a dataset barrier protecting the monopoly. The "open source community" around such "Open Source AI" is only able to conduct further development based on such AI, but not able to inspect the process of how the original piece of "Open
Source AI" is produced, and not able to improve the "Open Souce AI" itself.
This leads to many implications including but not limited to security and bias issues. For instance, without being able to access the original training data of an "Open Source AI", once those "Open Source AI" starts to say harmful or toxic things, or starts to deliver advertisements, nobody other than the first party is able to diagnose and fix the bias issue or rip the advertisement off and produce an improved AI. In the sense of traditional open source software this looks ridiculous because you can easily modify its source code, ripping off the advertisement pop up
window, and re-compile it.

My mind remains mostly the same from 6 years ago. And after 5~6 years, the most important concept in ML-Policy remains to be ToxicCandy, which is exactly AI released under
open source license with their training data hidden.

I felt OSI destines to draft something I disagree with some time ago. And upon the release of OSAID-1.0, it will make a huge, irreversible impact. I could not convince OSI to change their mind, but I do not want to see free software communities being
impacted by the OSAID and start to compromise software freedom.

No data, no trust. No data, no security. No data, no freedom[11].

Maybe it is time for us to build a consensus on how we tell whether a piece of AI is DFSG-compliant or not, instead of waiting for ftp-masters to interpret those
binary blobs case-by-case.

Do we need a GR to reach a consensus?

[1] https://opensource.org/ai/drafts/the-open-source-ai-definition-1-0-rc2
[2] https://www.fsf.org/news/fsf-is-working-on-freedom-in-machine-learning-applications
[3] https://opensource.org/osd
[4] https://www.debian.org/social_contract
[5] https://salsa.debian.org/deeplearning-team/ml-policy/-/blob/master/ML-Policy.rst
[6] https://lwn.net/Articles/760142/
[7] https://discuss.opensource.org/t/training-data-access/152
[8] https://discuss.opensource.org/t/list-of-unaddressed-issues-of-osaid-rc2/650 [9] https://discuss.opensource.org/t/what-does-preferred-form-really-mean-in-open-source/625 [10] https://discuss.opensource.org/t/the-open-source-ish-ai-definition-osaid/580
[11] The freedom to study, change, and improve the AI.


Reply to: