(please CC me in replies, as i'm not subscribed) hi,as the maintainer of the "opus" package (a popular low-latency audio codec) [0], i'm currently facing a package that started to include a ML-model in their latest and greatest release - or to put it in upstream's words: "Opus [1.5.1] gets a Serious Machine Learning Upgrade" [1].
my research so far shows that:- upstream's git repository contains the source code (mostly torch) to train the model(s) [2] - upstream's git repository contains a list of links to the training datasets [3]. i've checked all the listed datasets and they are free (CC-BY, CC-BY-SA). i estimate the total size of the compressed training data to be about 80GB) - the released source tarball only contains some generated C-source code files that contain the weights generated by the model
so i think that the package itself is Free, although i'm still communicating with upstream to have them document the entire training pipeline (e.g. I got some some vague "the training data might need manual assembling" on IRC, which i'm hoping they will document)
so anyhow, this seems to be an obvious package to apply the ML-policy.since this is my first package where i try to apply ML-policy, i thought i'd learn from examples. unfortunately, codesearch.debian.net does not return anything for "path:debian/rules reproduce-model" or "path:debian/rules get-external-data".
so I wonder, whether there are any packages that already apply §4.5 "Reproduce Rules" and §4.7 "External Data" in the archive?
mgfad IOhannes [0] https://tracker.debian.org/pkg/opus [1] https://opus-codec.org/demo/opus-1.5/ [2] https://gitlab.xiph.org/xiph/opus/-/tree/v1.5.1/dnn/torch [3] https://gitlab.xiph.org/xiph/opus/-/blob/v1.5.1/dnn/datasets.txt
Attachment:
OpenPGP_0xB65019C47F7A36F8.asc
Description: OpenPGP public key
Attachment:
OpenPGP_signature.asc
Description: OpenPGP digital signature