[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: About the sense of removing -march=native (Was: Is theano worth saving?)



> So do you think we are doing a bad service to our users by striping
> -march=native?  Could you please provide some numbers?

No, we are not doing bad. Nobody is wrong. We cannot gain compatibility
and performance at the same time. I don't remember the exact numbers of
those experiments conducted 4 months ago. Here's the fuzzy data
on my Torch-based program:

 (1) generic openBLAS:
     i7-6900K is only capable of ~1 experiment process at the same time.
     E5-2687Wv4 ~1
 (2) -march=native openBLAS:
     i7-6900K is only capable of ~2 experiment processes at the same time.
     E5-2687Wv4 ~2
 (3) -march=native openBLAS + proper OMP_NUM_THREADS:
     i7-6900K is capable of ~6 experiment processes at the same time.
     E5-2687Wv4 ~8
 (4) generic OpenBLAS + proper OMP_NUM_THREADS : not tested.

So the tuned OpenBLAS is >= 6x "faster" for us...
Sorry for the ambiguous word "faster".

I wrote the -march=native example in order to illustrate that some
users needs to compile specific software by themselves and don't
quite need a .deb package.

> I wonder whether we could invent some mechanism that is rebuilding a
> package in postinst and installs the result on the machine instead of a
> pre-build binary.  Or we could provide some toolset which enables
> scientists to download a set of source packages and build these after
> re-activating -march=native and move the results in a local repository
> which just needs to be added to sources.list.
> 
> Do you consider this as feasible ideas?

I ever thought of a mechanism somewhat similar to DKMS which enables
users to build their locally optimized packages, but I changed my mind
and didn't put that point forward to mailing list.

At that time I was a little sad since my packaging work involves
disabling SIMD support in the build system for the best compatibility.
They worked day and night optimizing their algorithms with SIMD and
what I do is to disable their optimization. Some examples:

 * I disabled nearly all SIMD instruction sets for lua-torch-torch7.
   See: https://lists.debian.org/debian-mentors/2016/10/msg00231.html
 * Some TensorFlow dependency libraries recommends or needs SIMD, e.g.
   .
    + https://github.com/google/highwayhash (ITP/packaged locally,
      requires SIMD for specific algorithm)
    + https://github.com/google/farmhash  (in experimental, SIMD disabled)
    + https://github.com/google/gemmlowp (ITP+RFS, recommends SIMD)
   .
   I don't know how slow the final TensorFlow package will be without SIMD.

However I changed my mind because (1) performance tuning is not as
simple as appending the -march=native flag. people who need the best
performance ought to know how to tune software performance by
themselves, and how to design & write performance friendly algorithm.
(2) we don't need a mechanism for merely several packages.


(1) the -march=native is not the only thing to do
=================================================

The -march=native rebuild is not all things to be done in order to
tune the performance, for a reproducible OpenBLAS-specific example:

$ sudo apt install caffe-cpu/unstable
$ # make sure the default BLAS is openblas
$ apt source caffe; cd caffe*
$ OMP_NUM_THREADS=4 caffe time -model models/bvlc_alexnet/deploy.prototxt -iterations 5

I0209 04:42:37.625675 16845 caffe.cpp:421] Average Forward-Backward: 1425.4 ms.
I0209 04:42:37.625704 16845 caffe.cpp:423] Total Time: 7127 ms.

$ OMP_NUM_THREADS=2 caffe time -model models/bvlc_alexnet/deploy.prototxt -iterations 5

I0209 04:43:33.661006 16871 caffe.cpp:421] Average Forward-Backward: 918.6 ms.
I0209 04:43:33.661020 16871 caffe.cpp:423] Total Time: 4593 ms.

My CPU is 2-core-4-thread intel i5-2520m . And the best OMP_NUM_THREADS depends
on the host CPU and the program, and is specific to OpenBLAS.

Beyond package maintainer's business.

(2) No need to invent a mechanism for -march=native rebuilds
============================================================

I think generally
  1) `export DEB_CFLAGS_MAINT_APPEND=-march=native` and
  2) `dpkg-buildpackage -us -uc`
are enough for a user to do a -march=native rebuild.

Moreover, similar to Atlas, OpenBLAS has a custom build target too:
https://tracker.debian.org/media/packages/o/openblas/rules-0.2.19-2

Users can use Linuxbrew if they don't like apt/dpkg.

My conclusion
=============

Our priority is to provide researchers a solid and painless operating system,
instead of providing them packages of the best performance. They will
tune the software performance by themselves when they need to do so.
Sometimes they just switch to ArchLinux (many AUR package are built locally).
Debian's package dependency tree is solider than that of ArchLinux,
and Debian's software installation process is painless[1] than that of Gentoo.

Thank you Andreas to bringing up this topic. I wanted to write those stuff
for a long time.

[1] Gentoo is painful when the user needs a program immediately.


Reply to: