[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: plan of deep learning team for next stable release?



Hi Mo,

I'd like to propose a more direct version, taking more into account the
current realities.

-- Background --

Science-related fields frequently require substantial amounts of raw
number-crunching. GPUs have emerged as popular devices for this purpose.
Using their massive parallelism (thousands of cores), they often vastly
outperform general-purpose CPUs on linear algebra and many other forms
of numeric computation. They are also ubiquitous.

However, our infrastructure currently has no support for these devices.
Consequently, the growing number of packages with upstream support for
hardware acceleration currently have this support disabled.

-- Imagined Solution --

We propose adding Nvidia CUDA-capable GPU devices to a buildd and a
porterbox (either to existing ones, but presumably new ones).

These machine can be used for/by:
 1) individual developers who does not have access to specific hardware
    to build, test, and diagnose software. (porterbox)
 2) maintainers who enabled OpenCL in the packages. (OpenCL)
 3) multimedia developers (encoders and decoders, and maybe also rendering?)
 4) testing nvptx offloading (GCC)
 5) test cuda-related software such as pytorch (CUDA)
 6) train neural networks by our own hardware. We may even share the
    computing resource to other FOSS organizations who want to train
    neural networks. (training)

-- Rational for supporting CUDA --

Nvidia's CUDA hardware/software ecosystem, although not free, has become
the de-facto standard in this area.

AMD is working on ROCm, a free alternative, but the ROCm maintainers
have concluded that both hardware and software are still too far behind.

The fact of the matter is that tasks exist, for example deep learning,
for which non-accelerated computing is simply not an alternative.
Efforts being expended on getting software for these tasks into Debian
are becoming moot. Not offering CUDA will not stop CUDA, it only means
that instead of

   $ sudo apt-get install python3-torch

users will do

    pip install torch

or possibly even consider switching to a distribution with better CUDA
support, and both of these alternatives are to the clear detriment of
Debian, for both maintainers and users.

We look forward to eventually having a free alternative through ROCm,
but we don't foresee this happening soon.



Regarding this part:

On 12/1/20 5:26 AM, Mo Zhou wrote:
> An experimental d/control field "X-Requires-Hwspec: <spec>", where <spec>
> could be one of {cpu, cuda, rocm, sycl, opencl, ...}.
> The default=cpu, specifier "cuda" can be fulfilled by nvidia GPUs, "rocm"
> by AMD GPUs, "sycl" by Inte's discrete/integrated GPUs and other GPUs,
> and "opencl" by any nvidia/amd/intel GPU.
>  -> in this way packages with specifid marks can be distributed to
>     specific buildds, and tested using our standard workflow.

 That's an implementation detail I'd leave out for now. Let's focus on
the policy problem (general yes/no) first, and the technical problem
once that hurdle is cleared.

Best,
Christian


Reply to: