Bits from /me: Difficulties in Deep Learning Framework Packaging

To: debian-devel@lists.debian.org
Subject: Bits from /me: Difficulties in Deep Learning Framework Packaging
From: Mo Zhou <lumin@debian.org>
Date: Tue, 16 Apr 2019 11:12:44 +0000
Message-id: <[🔎] 20190416111239.GA24345@Asuna>

Hi people,

This message is neither a good news, nor asking for help. I'm writing
to share some of my points about Deep Learning Framework packaging,
after a re-evaluation of the status of TensorFlow's latest build
systems. My thoughts are concluded from failures instead of success.
That said, they should be helpful to future maintainers who'd like to
maintain similar packages[1]. And you would probably find some of my
root initiatives for DUPR[2] or SIMDebian[6] in the points.

In Debian's context, maintainers have to face three obstacles:

1. License. Unfortunately the de facto dominating performance library is
cuDNN[3]. I'd say no serious user[4] would use a D-L framework without
cuDNN or TPU[5] acceleration. Maintaining a bunch of contrib or non-free
stuff is not good experience in Debian. Packaging for cuDNN is avaialble
under Salsa:nvidia-team, but the plan for uploading it had been aborted
because it's license looks too scary.

2. ISA Baseline. If you remember SIMDebian[6], or some of my motivations
of DUPR[2], it would be very easy to understand how the absense of SIMD
code affects the critical computational performance. People provided
helpful suggestions at this point, including ld.so[7] tricks and some
gcc features which allows run-time code selection according to cpu
capability[8]. The ld.so trick would bloat the resulting .deb packages
but it's the most applicable solution. In contrast, patching a million
lines of Tensorflow code to enable the "function attributes" feature
is probably impossible to a volunteer.

3. Build system. Look at the build systems of TensorFlow and PyTorch[10].
They are volatile due to the fast pace of development. Specifically,
TensorFlow's build system "bazel" is very hard to package for Debian,
and an anount of patching work is still required to prevent
bazel from downloading ~3.0GiB of ???[9] before building TensorFlow.
PyTorch's setup.py+cmake+shell build system ... requires some patching
work too.

So I recommend any future contributor who is about to deal with any deep
learning packages to carefully assess the 3 aspects above.  To some
extent I envy some other distros such as Arch and Gentoo, since they
already made a great progress in this field.

Sometimes ago (maybe several months?) in debian science team I said I'm
aborting D-L framework related development. Today Paul Liu poked me and
asked me about the status of src:tensorflow (in experimental).  I spent
several hours re-evaluating the situation, and finally decided to fully
give up and write the above points, because I'm not willing to undertake
the workload any more. At the same time, I filed Orphan[11] bugs against
tensorflow and several of its dependencies, except for src:nsync which
contains a neat set of cmake files. I plan to convert those Orphan bugs
into RM bugs after a year, if no one would touch them.

I do research with neural networks and I use these frameworks
frequently. Anadonda and Pip are already good enough for me. So DUPR[2]
is the best choice to me if I'd like some .deb packages.

This time I'm really giving up all related efforts [12], and shall never
touch them again. I don't feel pity, even if these points seem to be
tightly connected to some of my Debian activities. Apart from that, I'm
still willing to provide personal opinions about related packaging
works, or machine learning datasets, pretrained neural networks, etc.

Well, this result looks bad. Let's hope for a sun rise.

Best,
Mo

[1] Please take extra care in computational performance.
[2] https://github.com/dupr/duprkit
[3] (non-free) https://developer.nvidia.com/cudnn
[4] Bussiness groups, researchers.
[5] Google's computation acceleration hardware.
[6] https://github.com/SIMDebian/SIMDebian
[7] man ld.so -> search for "hardware capabilities"
[8] info gcc "Function Attributes";
    See Guillem's recent reply to "SIMDebian: ..." (d-devel@l.d.o)
[9]	I don't know what they are. They are more than build-deps.
[10] They are the top-2 frameworks.
[11] What a relief.
[12] My on-going works about intel-mkl / BLAS / LAPACK are unrelated.
     I still have strong interest in many other aspects of Debian development.

Reply to:

Follow-Ups:
- Re: Bits from /me: Difficulties in Deep Learning Framework Packaging
  - From: Andreas Tille <andreas@an3as.eu>

Prev by Date: Re: Introducting Debian Trends: historical graphs about Debian packaging practices, and "packages smells"
Next by Date: [prototype] Debian User Repository Toolkit 0.0a release
Previous by thread: Bug#927170: RFP: jadx -- Android Dex decompiler
Next by thread: Re: Bits from /me: Difficulties in Deep Learning Framework Packaging
Index(es):
- Date
- Thread