[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Rethink about who we (Debian) are in rapid dev cycle of deep learning



On Fri, 2022-01-14 at 15:35 +0100, Gard Spreemann wrote:
> 
> I understand how you reach these conclusions, both from the POV of
> hardware driver non-freedom and from the POV of the toxic candy
> problem
> of trained models. And while I agree with your conclusions, I do
> worry
> about the prospect of the lines blurring.

Indeed. But I eventually figured out that "lazy evaluation" on this
problem is the most realistic solution for distribution developers.
I'm not worried about it. See the reason below.

> 
> It's not unreasonable to expect that AI models become standard
> components of certain classes of software relatively soon. Nomatter 
[...]
> What do we do if/when an image compression scheme involving a deep
> learning model becomes popular? What do we do if/when every new FOSS
> game ships with an RL agent that takes 80 GPU-weeks of training to
> reproduce (and upstream supports nvidia only)? When every new text
> editor comes with an autocompleter based on some generative model
> that
> upstream trained on an unclearly licensed scraping of a gazillion
> webpages?
> 

Indeed. Deep Learning has been demonstrated effective in video
compression as well. However, research projects are not entering
Debian. Only those implementations for industrial standard enter
our archive. Only when standards like H.267 (imagined) really
introduces deep learning as a part of the core algorithm, should
we worry about the blurred borderline. However, even if that
happened eventually, upstreams such as videolan and ffmpeg will
have to think about GPL interpretation before we think about it.
There is already an historical example from ffmpeg where pre-trained
convolution kernels (in header file) are excluded from the GPL
source code. And I bet even the ISO standard group has to
think about the potential license/legal issues before introducing
that.

An RL agent that takes 80 GPU-weeks is also highly likely to
require a powerful GPU for inference when we play such game.
I play lots of games and what kind of open source game has
reached that level of being so GPU-demanding? Before that
comes true for free software games, they will first appear on
commercial titles, ahead of free software games by decades.

Generative model for code completion is already a widely known
problem, such as Github's codepilot. They are fancy and useful
but before we really think about the blurred borderline, we
have already seen how controversy it was.

Let's step back a little bit. When what you said all comes true,
there will be some way for the end users to install them onto
the system.
A relevant example is vscode. It is a prevalent editor, being
fond by a large user group across all systems. vscode's being
absent from official repository is not stopping the upstream
from distributing their own .deb packages. I understand how
tricky it is to package in our archive. I believe the same
thing will happen for new fancy AI tools (e.g., the face
authentication for linux tool already has its own .deb package).

Let me quote a word from a fellow developer: "In Debian we should
stop from chasing rabbits."  To me, "lazy evaluation" on these
problems is seemlingly the best strategy. Based on Debian's
role in this ecosystem, thinking about serious issues before
our upstream does destines to make negligible technical progress.

When we really have to execute those "lazy evaluation", we
are not unprepared since the community is already aware of
the precautions and warnings.


Reply to: