[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: CPU specific/optimized Debian builds ?

On Wed, 29 May 2002, Ulrich Eckhardt wrote:
> using as a benchmark compressing my cvsroot

Hmmph, everyone seems to be posting integer benchmarks when I would
actually expect the major benefit to be from improved scheduling
and use of MMX-type instructions in apps bound by floating-point

To illustrate, here are results of running at least three runs each of

    (unset DISPLAY;echo 'set samples 1000000;set size ratio -1;set xrange [-100:100];plot x*x/(1+x+x*x)'|time ./gnuplot)

with the following configurations on one of my systems (dual 650MHz
Pentium III):

stock Debian gnuplot 3.7.2-4 (-O2):
  5.66user 1.94system 0:07.56elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k
  5.77user 1.77system 0:07.53elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k
  5.69user 1.88system 0:07.56elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k

compiled with gcc-3.1 and -O2:
  5.68user 1.88system 0:07.55elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k
  5.85user 1.71system 0:07.56elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
  5.82user 1.75system 0:07.56elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k

compiled with gcc-3.1 and -O2 -march=pentium3:
  4.62user 1.85system 0:06.46elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k
  4.59user 1.89system 0:06.47elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k
  4.64user 1.81system 0:06.44elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k

compiled with gcc-3.1 and -O2 -march=pentium3 -fomit-frame-pointer:
  4.41user 1.84system 0:06.24elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k
  4.39user 1.86system 0:06.25elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
  4.52user 1.73system 0:06.23elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k

compiled with gcc-3.1 and all of the above + -mfpmath=sse:
  4.23user 1.86system 0:06.08elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k
  4.23user 1.87system 0:06.08elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k
  4.22user 1.88system 0:06.08elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k
  4.23user 1.87system 0:06.10elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k
  4.42user 1.67system 0:06.09elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
  4.13user 1.94system 0:06.07elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k

As can be seen, the cpu-specific -march=pentium3 flag helps noticeably
in this simple benchmark, and the other optimizations each help a
little.  (Since gnuplot mostly uses double precision internally, it is
possible -mfpmath=sse would have greater impact when compiled for and
run on a SSE2 system.  I have certainly seen it have a _tremendous_
impact on one of my own programs on that PIII box when executing a
tight, single-precision loop, which, of course, are uncommon in most

Of course, I have also seen cases where the extra flags do not help or
even hurt.  So, my own preference would be for the following:

- CPU detection handled upstream within the sources (ideal),
- else Debian cpu-specific packages _only_ where someone has verified
  it really seems to make a significant positive difference and
  doesn't cause additional bugs,


- ability for users to easily recompile packages from Debian source
  with custom compilers and compiler flags.


To UNSUBSCRIBE, email to debian-devel-request@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org

Reply to: