[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: plan of deep learning team for next stable release?



Hello,

On 01.12.20 11:02, Christian Kastner wrote:
> Hi Mo,
>
> I'd like to propose a more direct version, taking more into account the
> current realities.
>
> -- Background --
>
> Science-related fields frequently require substantial amounts of raw
> number-crunching. GPUs have emerged as popular devices for this purpose.
> Using their massive parallelism (thousands of cores), they often vastly
> outperform general-purpose CPUs on linear algebra and many other forms
> of numeric computation. They are also ubiquitous.
>
> However, our infrastructure currently has no support for these devices.
> Consequently, the growing number of packages with upstream support for
> hardware acceleration currently have this support disabled.
>
> -- Imagined Solution --
>
> We propose adding Nvidia CUDA-capable GPU devices to a buildd and a
> porterbox (either to existing ones, but presumably new ones).
>
> These machine can be used for/by:
>  1) individual developers who does not have access to specific hardware
>     to build, test, and diagnose software. (porterbox)
>  2) maintainers who enabled OpenCL in the packages. (OpenCL)
>  3) multimedia developers (encoders and decoders, and maybe also rendering?)
>  4) testing nvptx offloading (GCC)
>  5) test cuda-related software such as pytorch (CUDA)
>  6) train neural networks by our own hardware. We may even share the
>     computing resource to other FOSS organizations who want to train
>     neural networks. (training)
>
> -- Rational for supporting CUDA --
>
> Nvidia's CUDA hardware/software ecosystem, although not free, has become
> the de-facto standard in this area.
>
> AMD is working on ROCm, a free alternative, but the ROCm maintainers
> have concluded that both hardware and software are still too far behind.
>
> The fact of the matter is that tasks exist, for example deep learning,
> for which non-accelerated computing is simply not an alternative.
> Efforts being expended on getting software for these tasks into Debian
> are becoming moot. Not offering CUDA will not stop CUDA, it only means
> that instead of
>
>    $ sudo apt-get install python3-torch
>
> users will do
>
>     pip install torch
>
> or possibly even consider switching to a distribution with better CUDA
> support, and both of these alternatives are to the clear detriment of
> Debian, for both maintainers and users.
>
> We look forward to eventually having a free alternative through ROCm,
> but we don't foresee this happening soon.
>
>
>
> Regarding this part:
>
> On 12/1/20 5:26 AM, Mo Zhou wrote:
>> An experimental d/control field "X-Requires-Hwspec: <spec>", where <spec>
>> could be one of {cpu, cuda, rocm, sycl, opencl, ...}.
>> The default=cpu, specifier "cuda" can be fulfilled by nvidia GPUs, "rocm"
>> by AMD GPUs, "sycl" by Inte's discrete/integrated GPUs and other GPUs,
>> and "opencl" by any nvidia/amd/intel GPU.
>>  -> in this way packages with specifid marks can be distributed to
>>     specific buildds, and tested using our standard workflow.
>  That's an implementation detail I'd leave out for now. Let's focus on
> the policy problem (general yes/no) first, and the technical problem
> once that hurdle is cleared.

I agree to all that is said here.

The way to go is via our non-free autobuilders, not? 
https://www.debian.org/doc/manuals/developers-reference/pkgs.html#non-free-buildd

It should be somehow possible for me to come up with a buildd machine
should this help the discussion.

Since many DDs are using the CUDA drivers already, I think we have
already decided to trust them.

Kind regards,

Steffen



Reply to: