Re: plan of deep learning team for next stable release?

To: Christian Kastner <ckk@debian.org>
Cc: debian-ai@lists.debian.org
Subject: Re: plan of deep learning team for next stable release?
From: Mo Zhou <lumin@debian.org>
Date: Tue, 1 Dec 2020 04:26:01 +0000
Message-id: <[🔎] X8XF2VmaV4eLdBqs@Macadamia>
In-reply-to: <6a3fd1e9-4086-85e7-cddd-a8db47e548ac@debian.org>
References: <X7z+3Ef4Fr3OtMXb@Macadamia> <e0b76665-bc0f-3f2b-e3ab-9362f1ce1932@debian.org> <X8O31IHC8V7rlW7l@Macadamia> <6a3fd1e9-4086-85e7-cddd-a8db47e548ac@debian.org>

Hellow guys,

So let's briefly summarize problem we confont:

-- Background --

Hardware acceleration is essential for productivity in many science-related
fields, as the time consumption for intensive calculations can be greatly
reduced.  Of all existing solutions, nvidia/CUDA, although dominating the
market, is not well supported by our infrastructure. Given that such user
demand is unavoidable even for our fellow developers, maybe it's time to
rethink about our support to these solutions.

-- Imagined Solution --

Extend our infrastructure (buildd or alike, instead of xxx.debian.net)
with a host machine equipped with Nvidia GPUs. These machine can be used
for/by:

 1) individual developers who does not have access to specific hardware
    to build, test, and diagnose software. (porterbox)
 2) maintainers who enabled OpenCL in the packages. (OpenCL)
 3) multimedia developers (encoders and decoders, and maybe also rendering?)
 4) testing nvptx offloading (GCC)
 5) test cuda-related software such as pytorch (CUDA)
 6) train neural networks by our own hardware. We may even share the
    computing resource to other FOSS organizations who want to train
    neural networks. (training)

An experimental d/control field "X-Requires-Hwspec: <spec>", where <spec>
could be one of {cpu, cuda, rocm, sycl, opencl, ...}.
The default=cpu, specifier "cuda" can be fulfilled by nvidia GPUs, "rocm"
by AMD GPUs, "sycl" by Inte's discrete/integrated GPUs and other GPUs,
and "opencl" by any nvidia/amd/intel GPU.
 -> in this way packages with specifid marks can be distributed to
    specific buildds, and tested using our standard workflow.

And a similar experimental "Restrictions" keyword in d/test/control field.

-- Core Problem --

1. Are we willing to accept such a host running non-free[1] software
(e.g. nvidia-driver, pytorch-cuda, ffmpeg + nvidia codecs, opencv + opencl)
as one of our own machines?

[1] As an example, NVIDIA cuDNN's licence:
https://docs.nvidia.com/deeplearning/cudnn/sla/index.html

Reply to:

Follow-Ups:
- Re: plan of deep learning team for next stable release?
  - From: Mo Zhou <lumin@debian.org>
- Re: plan of deep learning team for next stable release?
  - From: Christian Kastner <ckk@kvr.at>

Next by Date: Re: plan of deep learning team for next stable release?
Next by thread: Re: plan of deep learning team for next stable release?
Index(es):
- Date
- Thread