[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

withdrawing Proposal A -- Interpretation of DFSG on Artificial Intelligence (AI) Models



Hi all,

I decided to withdraw the proposal A: "AI models released under DFSG-compatible
license without original training data or program" are not seen as DFSG-compliant.

Based on the overall discussions and feedbacks, we as a community is underprepared
to vote on this. Even if we vote, it is leading to a less convincing result.
According to the constitution, I think it is completely fine to withdraw the proposal
and cancel it temporarily, and get back when we are ready for it.

However, if the other proposals suddenly get enough sponsors in the last minute,
the proposal A has to be there. So this is a "conditional" withdraw, and I'm expecting
the GR to be canceled.

Some of my comments:

* People holding different opinions have too short time to prepare (although I
  already signaled everyone long time ago that I'll press the start button).
  The lack of other options can make the result less convincing.

  So if anybody is willing to propose option B,C,D..., next time, please continue
  working on your proposal and let me know. I'll coordinate the time to press start
  button to make sure none of you have to rush.
  As usual, I'll track everything of my proposal A here publically:
  https://salsa.debian.org/lumin/gr-ai-dfsg

* My initial thought about this GR is just to address the conceptual
  interpretation. But the real implication of this conceptual GR is what makes the
  audience unconfident on what to vote.

  To do such analysis, it would take some time to do a rough archive scan to
  figure out the packages that may be affected by this GR. Do you know any tool
  that can help me scan the whole Debian archive (source) with the following
  customized rules?

  for each source package in debian archive {
    for each file in source package {
      if it is plain text file {
        if is .json .xml etc and contains more than 1000 numbers {
          ask for human check
        } else {
          continue
        }
      } else {  (binary file)
        if it typically/potentially contains numerical array, like .safetensors,
            .pth, .ckpt, .npy, .npz {
          ask for human check
      } else if known-to-be-not-a-machine-learning-model, like .pdf {
        skip
      } else {
        unknown binary, ask for human check
      }
    }
  }

  If we do not have such a tool, I'll do it myself. Once I get the file list,
  I'll see whether I need volunteers to distribute the workload.

* Most people have an assumption that the "pre-trained models" are with good
  faith and trustworthy. But is it? I'm going to create a simple demonstration
  on how to implant backdoor in a neural network using my poorest hardware
  (raspberry pi). Everything I need for this demonstration is already in main
  section (both deep learning framework and dataset). I want to know
  how people would fix the backdoored toy model by modifying the matrices and
  vectors, when the matrices happens to be the "preferred form of
  modification".

  Debian is at a relatively bottom position in the supply chain. If Debian
  ships a model in main and see it a "preferred form of modification", any
  security or trustworthiness issue exposed afterwards may potentially nuke
  a reverse supply chain.


I do not believe I can fill in those blanks within a short time. Maybe a couple
of months are needed.

BTW, I cannot attent DebConf. If anybody wants to host some relevant discussion
there, please let me know what I can do online.

Attachment: signature.asc
Description: This is a digitally signed message part


Reply to: