science-optimisation (Was: About the sense of removing -march=native)

To: debian-science@lists.debian.org
Subject: science-optimisation (Was: About the sense of removing -march=native)
From: Andreas Tille <andreas@an3as.eu>
Date: Thu, 9 Feb 2017 11:59:00 +0100
Message-id: <[🔎] 20170209105900.GD26996@an3as.eu>
In-reply-to: <[🔎] 1486621428.19778.12.camel@gmail.com>
References: <[🔎] 20170208171016.GC21362@an3as.eu> <[🔎] 1486621428.19778.12.camel@gmail.com>

Hi lumin,

On Thu, Feb 09, 2017 at 06:23:48AM +0000, lumin wrote:
> 
> > So do you think we are doing a bad service to our users by striping
> > -march=native?  Could you please provide some numbers?
> 
> No, we are not doing bad. Nobody is wrong. We cannot gain compatibility
> and performance at the same time. I don't remember the exact numbers of
> those experiments conducted 4 months ago. Here's the fuzzy data
> on my Torch-based program:
> 
>  (1) generic openBLAS:
>      i7-6900K is only capable of ~1 experiment process at the same time.
>      E5-2687Wv4 ~1
>  (2) -march=native openBLAS:
>      i7-6900K is only capable of ~2 experiment processes at the same time.
>      E5-2687Wv4 ~2
>  (3) -march=native openBLAS + proper OMP_NUM_THREADS:
>      i7-6900K is capable of ~6 experiment processes at the same time.
>      E5-2687Wv4 ~8
>  (4) generic OpenBLAS + proper OMP_NUM_THREADS : not tested.
> 
> So the tuned OpenBLAS is >= 6x "faster" for us...
> Sorry for the ambiguous word "faster".
> 
> I wrote the -march=native example in order to illustrate that some
> users needs to compile specific software by themselves and don't
> quite need a .deb package.

Thanks for those numbers.

> > I wonder whether we could invent some mechanism that is rebuilding a
> > package in postinst and installs the result on the machine instead of a
> > pre-build binary.  Or we could provide some toolset which enables
> > scientists to download a set of source packages and build these after
> > re-activating -march=native and move the results in a local repository
> > which just needs to be added to sources.list.
> > 
> > Do you consider this as feasible ideas?
> 
> I ever thought of a mechanism somewhat similar to DKMS which enables
> users to build their locally optimized packages, but I changed my mind
> and didn't put that point forward to mailing list.

Well, DKMS might be actually an even better tool for my proposal in my
other mail.

> At that time I was a little sad since my packaging work involves
> disabling SIMD support in the build system for the best compatibility.
> They worked day and night optimizing their algorithms with SIMD and
> what I do is to disable their optimization. Some examples:
> 
>  * I disabled nearly all SIMD instruction sets for lua-torch-torch7.
>    See: https://lists.debian.org/debian-mentors/2016/10/msg00231.html
>  * Some TensorFlow dependency libraries recommends or needs SIMD, e.g.
>    .
>     + https://github.com/google/highwayhash (ITP/packaged locally,
>       requires SIMD for specific algorithm)
>     + https://github.com/google/farmhash  (in experimental, SIMD disabled)
>     + https://github.com/google/gemmlowp (ITP+RFS, recommends SIMD)
>    .
>    I don't know how slow the final TensorFlow package will be without SIMD.
> 
> However I changed my mind because (1) performance tuning is not as
> simple as appending the -march=native flag. people who need the best
> performance ought to know how to tune software performance by
> themselves, and how to design & write performance friendly algorithm.
> (2) we don't need a mechanism for merely several packages.

I'm pretty sure that the last bit of performance needs manual
interaction.  But don't you agree that we can do something for those
users who could live with something that is better than we are providing
now?  Let me do some wild-guess:  60% of users of atlas do realise that
it can be optimised and use what they get from the package.  (I think
I'm quite on the safe side with 60% since users probably install an
application using atlas and do not mind about the underlying libs.) I
have no idea about the percentage of those users that at one point in
time realise Debian is way slower then archlinux (or something like
this) and will leave Debian.
 
> (1) the -march=native is not the only thing to do
> =================================================
> 
> The -march=native rebuild is not all things to be done in order to
> tune the performance, for a reproducible OpenBLAS-specific example:
> 
> $ sudo apt install caffe-cpu/unstable
> $ # make sure the default BLAS is openblas
> $ apt source caffe; cd caffe*
> $ OMP_NUM_THREADS=4 caffe time -model models/bvlc_alexnet/deploy.prototxt -iterations 5
> 
> I0209 04:42:37.625675 16845 caffe.cpp:421] Average Forward-Backward: 1425.4 ms.
> I0209 04:42:37.625704 16845 caffe.cpp:423] Total Time: 7127 ms.
> 
> $ OMP_NUM_THREADS=2 caffe time -model models/bvlc_alexnet/deploy.prototxt -iterations 5
> 
> I0209 04:43:33.661006 16871 caffe.cpp:421] Average Forward-Backward: 918.6 ms.
> I0209 04:43:33.661020 16871 caffe.cpp:423] Total Time: 4593 ms.
> 
> My CPU is 2-core-4-thread intel i5-2520m . And the best OMP_NUM_THREADS depends
> on the host CPU and the program, and is specific to OpenBLAS.
> 
> Beyond package maintainer's business.
> 
> (2) No need to invent a mechanism for -march=native rebuilds
> ============================================================
> 
> I think generally
>   1) `export DEB_CFLAGS_MAINT_APPEND=-march=native` and
>   2) `dpkg-buildpackage -us -uc`
> are enough for a user to do a -march=native rebuild.
> 
> Moreover, similar to Atlas, OpenBLAS has a custom build target too:
> https://tracker.debian.org/media/packages/o/openblas/rules-0.2.19-2
> 
> Users can use Linuxbrew if they don't like apt/dpkg.
> 
> My conclusion
> =============
> 
> Our priority is to provide researchers a solid and painless operating system,
> instead of providing them packages of the best performance. They will
> tune the software performance by themselves when they need to do so.
> Sometimes they just switch to ArchLinux (many AUR package are built locally).
> Debian's package dependency tree is solider than that of ArchLinux,
> and Debian's software installation process is painless[1] than that of Gentoo.

Thanks for the considerations but I have the ambition that we could
do better than we currently are doing. :-)
 
> Thank you Andreas to bringing up this topic. I wanted to write those stuff
> for a long time.

So that's a nice fit. ;-)

Kind regards

     Andreas. 

-- 
http://fam-tille.de

Reply to:

References:
- About the sense of removing -march=native (Was: Is theano worth saving?)
  - From: Andreas Tille <andreas@an3as.eu>
- Re: About the sense of removing -march=native (Was: Is theano worth saving?)
  - From: lumin <cdluminate@gmail.com>

Prev by Date: science-optimisation (Was: About the sense of removing -march=native)
Next by Date: Re: About the sense of removing -march=native (Was: Is theano worth saving?)
Previous by thread: Re: About the sense of removing -march=native (Was: Is theano worth saving?)
Next by thread: Re: About the sense of removing -march=native (Was: Is theano worth saving?)
Index(es):
- Date
- Thread