Rethink about who we (Debian) are in rapid dev cycle of deep learning

To: debian-project@lists.debian.org
Cc: debian-ai@lists.debian.org
Subject: Rethink about who we (Debian) are in rapid dev cycle of deep learning
From: "M. Zhou" <lumin@debian.org>
Date: Wed, 12 Jan 2022 19:00:18 -0500
Message-id: <[🔎] 8bc4a2fdb2a0619d1d82149889c8b30938b54ad6.camel@debian.org>

Keywords: GPU computing support, AI applications & ML-Policy.

Deep learning is a new area. From our past discussions, we have already noted
that this area introduces many new questions to Debian. For example, the new AI
applications may even challenge the definition of free software. In this
article I shall share my latest reviews on related topics across multiple
domains, reviews on some of my past forecasts, as well as some relevant
development advices.

Note, the whole article only conveys my own personal opinion, and does not
represent any official opinion of the Debian Project.

# Debian's GPU computing support -- how much should we do? ####################

The recent success of partly depend on the development of GPU, which can
compute matrix multiplication hundreds of times faster than a CPU. Thus, GPU
computation is very valuable. And intuitively, supporting GPU computation as
much as we can from the Debian side is useful and valuable as well.  Due to
software license issues from some certain vendor, I've been seeking for the
boundary for long time -- how much should we do to support a certain type of
GPU computation? Now I finally figured out my own answer.

Debian is merely a _downstream_ in terms of providing GPU support for the
end-users. As long as the upstream is willing to give us chance (legally) and
is easy to cooperate, we can support that. Otherwise a dead-end will soon be
reached, unsurprisingly.

I've had some discussions with several fellow developers on suggesting Debian
to buy some GPUs to extend its infrastructures for better GPU support.
The plan to put forward those ideas to a larger audience inside Debian had
been indefinitely postponed because we know the requirement of non-free
driver (there is no free alternative) would be a big problem.

Although my initial thought is to make Debian useful in more areas like GPU
computing, I finally realized that by accepting new non-free blobs as an
organization, we are further loosing our core value written on our homepage --
"a complete free operating system".

My conclusion is: "Users with special demands can take care of themselves,
as we are unable to go far on our own." In terms of GPU computing, Debian
is providing a great system as a foundation for development and applications.

Of course, deep learning frameworks are regular software we are already
familiar enough with. Their GPU support simply depends on whether the necessary
drivers and libraries are maintained in Debian.

# AI Applications & ML-Policy #################################################

I predict that the ML-Policy [1] will work as a warning on potential issues
instead of some practical guidance on packaging, because there are (and will
be) long-existing issues hard to overcome which make our packages not really
useful without external components. Throughout the whole ML-Policy, I think the
most valuable warning is the definition of "ToxicCandy Model", which identifies
software freedom trap for random developers interested in AI software.

Cool and useful stuff keeps emerging -- e.g., Facial Authentication for Linux
  https://github.com/boltgolt/howdy
And it depends on some pre-trained models (licence: CC0-1.0):
  https://github.com/davisking/dlib-models
People may still have some impression on the past discussions on ML-Policy.
When we treat pre-trained models as something like a picture or a song,
they may enter our main archive. But when we try to exercise software freedom,
things will go wrong. For example, we can study a painting/song and analyze it
to learn something, but this does not work for pre-trained models. Without
the training data there is no much way to study/learn/reproduce the pre-trained
models. As per definition in ML-Policy the mentioned model is ToxicCandy model.

Based on my interpretation, it means Debian might step aside from the world of
AI applications to fully exercise software freedom. It's a pity but Debian's
major role in the whole thing is a solid system.

Workarounds to address that pity are possible. For example, the past "Debian
User Package Repository" idea. By distributing only package building scripts
to end-users so they can build corresponding packages locally. In this way
the license issues and software freedom issues are bypassed as the user has
determined to accept the potential issues.

On the other hand, I'd advise people who want to package interesting AI
applications carefully evaluate whether it is mature enough -- and never
package a pure academic research project. This is largely due to our
development cycle is much slower than the revolution cycle in the deep learning
field. Something better may appear before it clear's our NEW queue...

As for AI applications that require considerable computing power (GPU), the
answer rather distinct.

[1] https://salsa.debian.org/deeplearning-team/ml-policy/-/blob/master/ML-Policy.rst

# Concluding Remarks

We maintain and provide a free operating system, and we value software freedom.
My contribution here is to provide my understanding on the boundary between
what we can do and what we can't do with respect to a new interesting area.  At
least I learned a lot when thinking about this, and got a deeper understanding
on "what Debian is".

Debian is wonderful because this is one of the only few places on the earth
where people will shout when software freedom is potentially infringed.
Indeed, Debian must have its own uniqueness in the impression of every long
term members of the project.

Thank you for the excellent system, fellow developers.

Reply to:

Follow-Ups:
- Re: Rethink about who we (Debian) are in rapid dev cycle of deep learning
  - From: Thomas Goirand <zigo@debian.org>
- Re: Rethink about who we (Debian) are in rapid dev cycle of deep learning
  - From: Davide Prina <davide.prina@gmail.com>
- Re: Rethink about who we (Debian) are in rapid dev cycle of deep learning
  - From: Gard Spreemann <gspr@nonempty.org>
- Re: Rethink about who we (Debian) are in rapid dev cycle of deep learning
  - From: Paul Wise <pabs@debian.org>

Prev by Date: Re: Nokia n 900 internet problem
Next by Date: Re: Rethink about who we (Debian) are in rapid dev cycle of deep learning
Previous by thread: Re: Nokia n 900 internet problem
Next by thread: Re: Rethink about who we (Debian) are in rapid dev cycle of deep learning
Index(es):
- Date
- Thread