Thorsten Glaser <tg@debian.org> writes: > So, with all the updates, maybe something like this? I read this now, and think it is an improvement so I'll second this version too. I realized that I have one additional generic concern: You claim that models are a derivate work of their training input. I don't think this is universally agreed on, or tested in court, and there are people who heavily push another agenda. It is somewhat of a provocative statement. However, I don't think you actually need to make an argument that his is true for your proposal. You don't need to take a stanze on this provocative question. People who disagree with this aspect could still find themselves in agreement with your proposal if it was tweaked a bit. It is sufficient to claim that A) models MAY be considered derivate works of their training inputs. We can realize that Debian is not the best organization to decide if this is true or not, and likely this will take many years until there is any general concsensus in the society about this aspect. However what we can claim is that it seems realistic that this MAY be the general opinion. and B) a conservative approach is thus to respect the licensing of all training inputs, until society have any clear take on A). This allows Debian to continue to work and take what appears to be less legal risk, and to more be aligned with the history of supporting libre content. Below is a small diff to achieve this: OLD: > 1. A model must be trained only from legally obtained and used works, > honour all licences of the works used in training, and be licenced > under a suitable licence itself that allows distribution, or it is > not even acceptable for non-free. This includes an understanding > that “generative AI” output are derivative works of their inputs > (including training data and the prompt), insofar as these pass > threshold of originality, that is, generative AI acts similar to > a lossy compression followed by decompression, or to a compiler. NEW: > 1. A model must be trained only from legally obtained and used works, > honour all licences of the works used in training, and be licenced > under a suitable licence itself that allows distribution, or it is > not even acceptable for non-free. > > This assumes an understanding that “generative AI” output may be > considered derivative works of their inputs (including training > data and the prompt), insofar as these pass threshold of > originality. That is, generative AI acts similar to a lossy > compression followed by decompression, or to a compiler. OLD: > Any work resulting from generative use of a model can at most be > as free as the model itself; e.g. programming with a model from > contrib/non-free assisting prevents the result from entering main. NEW: > Assuming a model output is a derivate work of their training input, > and works derived from that model is also a derivate work, any work > resulting from model can at most be as free as the model itself; > e.g. programming with a model from contrib/non-free assisting > prevents the result from entering main. ADD: > We resolve that Debian wants to make conservative licensing choices > and not put ourselves into unnecessary legal risk, therefor we > propose to behave and act as if that were the case and works > derived from training inputs have to consider the license on their > inputs. This aligns with our preference for free software and > DFSG-compatible licensing. I'm short on time so this maybe wasn't the best choice of words, so feel free to rewrite it if you agree with my principle. A small comment: > ⅱ. Any existing package with a “model” inside that already had the > very same model before 2020-01-01 has an extra four years time > before bugs regarding these models may become release-critical. Why 2020-01-01? Couldn't we be generous here and say that if someone was in the initial Bookworm release then it is eligible for this exception? /Simon
Attachment:
signature.asc
Description: PGP signature