Re: plan of deep learning team for next stable release?

To: Christian Kastner <ckk@debian.org>
Cc: debian-ai@lists.debian.org
Subject: Re: plan of deep learning team for next stable release?
From: Mo Zhou <lumin@debian.org>
Date: Sun, 29 Nov 2020 15:01:40 +0000
Message-id: <[🔎] X8O31IHC8V7rlW7l@Macadamia>
In-reply-to: <[🔎] e0b76665-bc0f-3f2b-e3ab-9362f1ce1932@debian.org>
References: <[🔎] X7z+3Ef4Fr3OtMXb@Macadamia> <[🔎] e0b76665-bc0f-3f2b-e3ab-9362f1ce1932@debian.org>

Hi Christian,

On Sat, Nov 28, 2020 at 12:19:32PM +0100, Christian Kastner wrote:
> On 11/24/20 1:38 PM, Mo Zhou wrote:
> > I've just uploaded pytorch 1.7.0-2 to unstable, which should compile on
> > at least amd64 arm64 and ppc64el. There is no any non-cpu hardware
> > acceleration enabled yet.
> 
> official non-CPU support is probably going to be a larger issue, but I
> think the sooner we address this, the better.

Agreed. Now that the cpu-only version of pytorch has landed on the
archive. The -cuda version should not be hard to preprae.

> So, how support for accelerated packages look? One would need
> software/configuration support in the package itself,

For the cuda version of pytorch, one important and madatory piece of
software (non-free) is still missing from debian: cuDNN.

> and one would need
> hardware to test it during build and/or autopkgtest.

In terms of CUDA, at least hardware is not a problem for me as I must
use many nvidia gpus in my daily research work.

> Our buildds currently don't have the necessary hardware. I believe we
> should strive to change that. The funds are there (as recently addressed
> by the DPL) and accelerated computing is developing into a key area that
> Debian cannot miss out on, in my opinion.

We could make good use of our funds, but I'm not sure whether running
non-free software on non-free hardware for our specific purpose would be
accepted by the community. Maybe I can carefully raise an question on
-devel when we reached some kind of consensus.

> With regards to software/configuration, I think the way to go would be
> to request new build profiles (one per flavor), so that B-Ds only get
> installed where necessary, and packages only get built where possible.

Sounds like changes in apt/dpkg will be required. I'd recommend the
solution for caffe, see src:caffe and src:caffe-cuda (removed from
unstable).

> Ideally, we'd also have some porter boxes. This sounds like going far by
> Debian standards, but we are talking about only a few hundred Euros per
> box, and as the DPL said, we should make more use of our funds.

Official porterbox running too many non-free stuff does not sound
sensible.

> The elephant in the room is, of course, CUDA. It's non-free so that will
> irk a lot of people, but it's also the de facto standard, and I don't
> see what alternative we have. People needing accelerated computing today
> will rather leave behind Debian than CUDA.

This is indeed a good question to raise again. Without CUDA acceleration
I will not be able to use the pytorch compiled for Debian in my own
research work, and the use case for the cpu-only version is undoubtedly
limited.

> Thoughts?

The core questions is that are we willing to compromise to the cuda
software stack, in order to introduce something more useful and
practical.

Reply to:

Follow-Ups:
- Re: plan of deep learning team for next stable release?
  - From: Christian Kastner <ckk@debian.org>

References:
- plan of deep learning team for next stable release?
  - From: Mo Zhou <lumin@debian.org>
- Re: plan of deep learning team for next stable release?
  - From: Christian Kastner <ckk@debian.org>

Prev by Date: Re: libjulia-openblas64 (Was: plan of deep learning team for next stable release?
Next by Date: Re: plan of deep learning team for next stable release?
Previous by thread: Re: plan of deep learning team for next stable release?
Next by thread: Re: plan of deep learning team for next stable release?
Index(es):
- Date
- Thread