Re: Training and large data sets?
On 28/09/2021 09:37, Wouter Verhelst wrote:
I think if we are going to require maintainers to upload pretrained
models, especially if those are models that can only be trained on
nvidia GPUs, that we've essentially given up. That can't be the intent.
We are finally at a point where *all* software in Debian stable was
built on Debian hardware, most of it reproducibly. Yes, training data on
Debian hardware is a hard problem to tackle; but saying "so let's not do
it" is throwing in the towel, and we shouldn't do that, at least IMO.
Even if not, I personally just do not have the infrastructure to train
the model myself (I'd have to keep my laptop running for days on end,
and, well, it's a laptop, so...), so if that's going to be required,
it's going to be a no from me.
In $dayjob i'm working on a project that would allow users to train ML
models on large EO datasets; think "imagenet for Earth Observation".
(A standard labelled dataset, easily loaded into most toolkits, on which
you can develop and test your own ML models).
This progresses to enabling a user to train a model on "all" satellite
data (at least PBs of imagery) with associated ground-truth information.
This would be done on open datasets, Open European infrastructure, and
allowing users to upload their own workflows to do so - imagine
uploading a training job in the form of a singularity container with a
Debian instance containing packaged model, trained on PBs of "local
storage" on a cluster, generating a reasonable-sized (MB ) model.
Is this model "open" and available to be used in Debian ? its
reproducible in all ways barring possibly cost and the "Desert Island"
It wouldn't be a Debian archive as its extremely expensive, tax-payer
funded and open-ended archive. (ESA).
It can, sort of, make the nVidia problem moot, depending on the
interfacing to hardware under tensorflow/keras/pytorch, as the code
works on smaller versions of the dataset "locally" (for development)
and the hardware at cloud-scale is something you're not going to own at
home (shades of GNUs original issues to running on any hardware needing
Alastair McKinstry, <email@example.com>, <firstname.lastname@example.org>, https://diaspora.sceal.ie/u/amckinstry
Misentropy: doubting that the Universe is becoming more disordered.