On 28/09/2021 09:37, Wouter Verhelst wrote:
I think if we are going to require maintainers to upload pretrained models, especially if those are models that can only be trained on nvidia GPUs, that we've essentially given up. That can't be the intent. We are finally at a point where *all* software in Debian stable was built on Debian hardware, most of it reproducibly. Yes, training data on Debian hardware is a hard problem to tackle; but saying "so let's not do it" is throwing in the towel, and we shouldn't do that, at least IMO. Even if not, I personally just do not have the infrastructure to train the model myself (I'd have to keep my laptop running for days on end, and, well, it's a laptop, so...), so if that's going to be required, it's going to be a no from me.
In $dayjob i'm working on a project that would allow users to train ML models on large EO datasets; think "imagenet for Earth Observation".
(A standard labelled dataset, easily loaded into most toolkits, on which you can develop and test your own ML models).
This progresses to enabling a user to train a model on "all" satellite data (at least PBs of imagery) with associated ground-truth information.
This would be done on open datasets, Open European infrastructure, and allowing users to upload their own workflows to do so - imagine uploading a training job in the form of a singularity container with a Debian instance containing packaged model, trained on PBs of "local storage" on a cluster, generating a reasonable-sized (MB ) model.
Is this model "open" and available to be used in Debian ? its reproducible in all ways barring possibly cost and the "Desert Island" criteria.
It wouldn't be a Debian archive as its extremely expensive, tax-payer funded and open-ended archive. (ESA).
It can, sort of, make the nVidia problem moot, depending on the interfacing to hardware under tensorflow/keras/pytorch, as the code works on smaller versions of the dataset "locally" (for development) and the hardware at cloud-scale is something you're not going to own at home (shades of GNUs original issues to running on any hardware needing "3-phase power").
regards Alastair -- Alastair McKinstry, <alastair@sceal.ie>, <mckinstry@debian.org>, https://diaspora.sceal.ie/u/amckinstry Misentropy: doubting that the Universe is becoming more disordered.