[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Recalling Key Points of the Previous Attempt (was: Re: Regarding the new "Debian User Repository"

Hi fellow devs,

As the previous attempt on debian user repository has been mentioned,
interested in recalling some key points of it and bits I've learned,
with some updates.

# how the initial attempt started

In terms of debian-related works, I mainly focus on scientific computing
and deep learning packages. These packages are very
For example, the performance of BLAS / LAPACK significantly impacts the
performance of the whole inverse dependency tree including important
such as Numpy, Octave, etc. [8] Namely, a fast BLAS / LAPACK
benefits the whole tree wherever there is mathematical operations on
numerical arrays.

However, tensorflow is an exception, as it is built upon eigen3 (a
only linear algebra library in C++) instead of BLAS/LAPACK. Eigen3 does
not support dynamic code branch selection based on CPU capability
and hence performs very poor if compiled against Debian's default AMD64

There are a few methods to bump the ISA baseline for a debian package
for the official archive: (1) patch the code with gcc's fmv feature;
(2) use the "hardware capabilities" feature of ld.so(8); (3) let the
user modify debian/rules and rebuild package locally; (4) directly
bump the ISA baseline for the whole archive; (5) Gentoo-style
partial Debian distribution.

For (1), it's impossible to patch tensorflow (millions of lines of
Solution (2) will result in very bulky binary packages; Solution (3)
is somewhat convincing to me since I think a serious user who need
should learn to build optimized software (e.g. -march=native); Solution
(4) was implemented as SIMDebian (deprecated). Solution (5) is
as the previous attempt of Debian User Repository (deprecated).

I shall briefly review solution (4) and (5) in following sections

[8] If you are concerned about the BLAS/LAPACK performance on Debian:

# regarding SIMDebian

The core idea is a partial Debian archive (binary packages with bumped
ISA baseline) containing some selected packages which manifests
performance gain [9]. My implementation modifies the default buildflags
of dpkg [10], and hence is able to inject "-march=xxx" into the package
building process without modifying the source of a debian package.
Thus we can automatically rebuild an optimized partial debian archive.

My knowledge on which package could benefit from this is rather limited
as far as I know only Eigen3 reverse dependencies do. We may borrow some
experience from the Gentoo community but I eventually stopped
this idea, because there are still some other issues that this idea
deal with.

[9] https://github.com/SIMDebian/SIMDebian (discontinued)

# regarding previous attempt on DUR

What if we just distribute packaging scripts and let users build the
packages on their localhost like Gentoo? In this way inserting
in d/rules can be sensible in many cases.

The problematic license of Nvidia's non-free blob (e.g., cuDNN which is
inevitable for most deep learning users) can be bypassed if it is
and locally-packed on the user's host. Apart from deep learning, the
version of ffmpeg is useful for multimedia users, but can only be built
locally from source due to license.

The license issue of ToxicCandy [12] deep learning models can be
if the user wants to take their own risks and install some fancy AI
Things can be more complicated if we combine an automatic code generator
and the GPL license. Anyway, Gentoo-style source-based user repository,
allows the legal issues to be offloaded to the endusers who are willing
accept them [14].

[11] https://github.com/dupr/duprkit/blob/master/doc/motivation.md
[13] https://copilot.github.com/
[14] I accept Nvidia's license as a personal user for deep learning
     But I tend to refuse Nvidia's license when working for Debian ...
     Bypassing the license issue is the only viable way I can think of
     If we want to integrate more fancy AI stuff into the system in the

# summary and further discussion

SIMDebian tries to bump the ISA baseline and create a binary partial
archive. The previous attempt of DUR tries to only distribute packaging
scripts to the enduser and let the users build packages locally, and
simultaneously achieving package optimization, and legal issue

I find my proposed concept "ToxicCandy Model" rather interesting.
such models can really prevent something tricky from sneaking into our
archive. We shall see how it works in the future.

Dealing with deep learning for Debian is basically a headache -- we need
performance, but we are also concerned about problematic licenses ...
I'm cc'ing debian-ai as this mail is high relevant.

I'm open to further discussion if you are inspired, or have some new

On 2021-07-02 17:16, Stephan Lachnit wrote:
> Why do I think this is relevant for Debian?
> This was not the first attempt of building a "DUR" [7], and at least I
> [7] https://lists.debian.org/debian-devel/2019/04/msg00064.html

Reply to: