Re: Rethink about who we (Debian) are in rapid dev cycle of deep learning

To: Paul Wise <pabs@debian.org>, debian-project@lists.debian.org
Cc: debian-ai@lists.debian.org
Subject: Re: Rethink about who we (Debian) are in rapid dev cycle of deep learning
From: "M. Zhou" <lumin@debian.org>
Date: Sat, 15 Jan 2022 21:32:04 -0500
Message-id: <[🔎] 9e177ab6d619a1291f7297ed5d93146e065d2f09.camel@debian.org>
In-reply-to: <[🔎] 9e1f7af5a3e73ec70cd77d6a44027ed4f2b4ed0f.camel@debian.org>
References: <[🔎] 8bc4a2fdb2a0619d1d82149889c8b30938b54ad6.camel@debian.org> <[🔎] 9e1f7af5a3e73ec70cd77d6a44027ed4f2b4ed0f.camel@debian.org>

Hi Paul,

Thanks for the additional questions.

On Sun, 2022-01-16 at 09:43 +0800, Paul Wise wrote:
> On Wed, 2022-01-12 at 19:00 -0500, M. Zhou wrote:
> 
> > I've had some discussions with several fellow developers on
> > suggesting Debian
> > to buy some GPUs to extend its infrastructures for better GPU
> > support.
> 
> Was there a plan for what to use these GPUs for?

Not specific plan, but I can list some of its usage if we have one.

Assuming it's an nvidia GPU, we can use it for

 1. building and testing cuda-related computational software,
    such as tensorflow-cuda, pytorch-cuda, magma, etc.
    (this demand is confirmed by we debian deep learning team)

 2. building and testing some multimedia tools, such as ffmpeg
    (when linked against nvidia's library, the resulting ffmpeg
    binary is not redistributable).
    (this demand is not confirmed with multimedia team)

 3. building and testing GPU acceleration for software such as
    blender.
    (not confirmed with maintainer)

 4. transcoding our videos (e.g., our debconf videos.)
    (not confirmed with debconf team)

 5. train neural networks?
    (Such demand should be quite rare, given my view point
     in the original post)

And the problem is that nvidia-driver is non-free. It is
inevitable for any upper layer application. The open source
driver nouveau cannot do any of the above.

Assuming it's an AMD GPU, we can use it for

 1. building and testing ROCm (the AMD's opensource counterpart
    to CUDA). It looks like the amdgpu driver in kernel
    is enough to drive the ROCm without requiring non-free blob.
    (I'm not sure whether firmware is still required)
    (people in debian-ai@l.d.o is recently working on packaging)

 2. some deep learning framework has added ROCm support,
    such as pytorch. we can build and test it

 3. build/test any software with OpenCL support, such as
    opencv, etc. So we don't have to do everything with pocl.

 4. 5. same to nvidia's 4 and 5.

Assuming it's an Intel GPU,

 I simply don't know. Let's wait and see the news.
 Intel is making effort on SYCL (an abstraction of OpenCL), which
 is called DPC++ by the upstream. Intel has not yet merged SYCL
 into LLVM upstream.

> Were they needed for driver/other package building/testing?

Non-free driver is required for nvidia GPU. Unfortuately for
industry users (especially machine learning users) nvidia GPU
is the most widely-supported and mature option.

Kernel already has the driver for AMD GPU. I'm just not sure
whether firmware is required to run ROCm or OpenCL etc.

> Were they to be used for libre model training?

As long as we finish the deep learning framework packaging
with specific hardware support, we can do so -- as long as
we have the corresponding "libre" data.

> 
> 
> This isn't any different to most modern hardware devices, which
> either
> have non-free blobs embedded in them or have non-free blobs uploaded
> to
> them or both. Even worse, server hardware often requires proprietary
> software running in userspace to manage parts of the server. The
> modern
> hardware industry does not produce hardware that allows Debian to
> avoid
> dealing with these blobs in some way. GPUs aren't any different here
> IMO. Things may change with RISC-V, OpenBMC and other efforts though.

I still remember the microcode example from the last discussion,
and it's true. But the server proprietary software are inevitable
to make it fully functional, while GPU doesn't.
An infra server can be fully functional without a GPU -- GPU
not inevitable.

> 
> 
> > Based on my interpretation, it means Debian might step aside from
> > the
> > world of AI applications to fully exercise software freedom. It's a
> > pity but Debian's major role in the whole thing is a solid system.
> 
> I think we should simply follow our social contract and guidelines as
> usual. Package useful things, but place them in contrib or non-free
> as
> appropriate depending on the situation. Advocate for the release of
> libre training data, retraining from scratch, license changes etc.

Yes, recalling our initial motivation and principals is a very good
idea when facing complicated issues. I fully agree.

> PS: I note that we already have Toxic Candy models in Debian main.
> For example the rnnoise model was trained from proprietary data
> but is available in Debian source packages:
> 
> $ apt-file search -I dsc rnnoise
> 

Well... right. I've seen related bug reports. Thanks!

Reply to:

References:
- Rethink about who we (Debian) are in rapid dev cycle of deep learning
  - From: "M. Zhou" <lumin@debian.org>
- Re: Rethink about who we (Debian) are in rapid dev cycle of deep learning
  - From: Paul Wise <pabs@debian.org>

Prev by Date: Re: Rethink about who we (Debian) are in rapid dev cycle of deep learning
Next by Date: Re: ROCm hardware coverage assessment for Debian maintenance
Previous by thread: Re: Rethink about who we (Debian) are in rapid dev cycle of deep learning
Next by thread: Processing of roct-thunk-interface_4.5.2+dfsg-1_source.changes
Index(es):
- Date
- Thread