Hi all, I decided to withdraw the proposal A: "AI models released under DFSG-compatible license without original training data or program" are not seen as DFSG-compliant. Based on the overall discussions and feedbacks, we as a community is underprepared to vote on this. Even if we vote, it is leading to a less convincing result. According to the constitution, I think it is completely fine to withdraw the proposal and cancel it temporarily, and get back when we are ready for it. However, if the other proposals suddenly get enough sponsors in the last minute, the proposal A has to be there. So this is a "conditional" withdraw, and I'm expecting the GR to be canceled. Some of my comments: * People holding different opinions have too short time to prepare (although I already signaled everyone long time ago that I'll press the start button). The lack of other options can make the result less convincing. So if anybody is willing to propose option B,C,D..., next time, please continue working on your proposal and let me know. I'll coordinate the time to press start button to make sure none of you have to rush. As usual, I'll track everything of my proposal A here publically: https://salsa.debian.org/lumin/gr-ai-dfsg * My initial thought about this GR is just to address the conceptual interpretation. But the real implication of this conceptual GR is what makes the audience unconfident on what to vote. To do such analysis, it would take some time to do a rough archive scan to figure out the packages that may be affected by this GR. Do you know any tool that can help me scan the whole Debian archive (source) with the following customized rules? for each source package in debian archive { for each file in source package { if it is plain text file { if is .json .xml etc and contains more than 1000 numbers { ask for human check } else { continue } } else { (binary file) if it typically/potentially contains numerical array, like .safetensors, .pth, .ckpt, .npy, .npz { ask for human check } else if known-to-be-not-a-machine-learning-model, like .pdf { skip } else { unknown binary, ask for human check } } } If we do not have such a tool, I'll do it myself. Once I get the file list, I'll see whether I need volunteers to distribute the workload. * Most people have an assumption that the "pre-trained models" are with good faith and trustworthy. But is it? I'm going to create a simple demonstration on how to implant backdoor in a neural network using my poorest hardware (raspberry pi). Everything I need for this demonstration is already in main section (both deep learning framework and dataset). I want to know how people would fix the backdoored toy model by modifying the matrices and vectors, when the matrices happens to be the "preferred form of modification". Debian is at a relatively bottom position in the supply chain. If Debian ships a model in main and see it a "preferred form of modification", any security or trustworthiness issue exposed afterwards may potentially nuke a reverse supply chain. I do not believe I can fill in those blanks within a short time. Maybe a couple of months are needed. BTW, I cannot attent DebConf. If anybody wants to host some relevant discussion there, please let me know what I can do online.
Attachment:
signature.asc
Description: This is a digitally signed message part