Re: CPU specific/optimized Debian builds ?
On Wed, 29 May 2002, Ulrich Eckhardt wrote:
> using as a benchmark compressing my cvsroot
Hmmph, everyone seems to be posting integer benchmarks when I would
actually expect the major benefit to be from improved scheduling
and use of MMX-type instructions in apps bound by floating-point
computations.
To illustrate, here are results of running at least three runs each of
(unset DISPLAY;echo 'set samples 1000000;set size ratio -1;set xrange [-100:100];plot x*x/(1+x+x*x)'|time ./gnuplot)
with the following configurations on one of my systems (dual 650MHz
Pentium III):
stock Debian gnuplot 3.7.2-4 (-O2):
5.66user 1.94system 0:07.56elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k
5.77user 1.77system 0:07.53elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k
5.69user 1.88system 0:07.56elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k
compiled with gcc-3.1 and -O2:
5.68user 1.88system 0:07.55elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k
5.85user 1.71system 0:07.56elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
5.82user 1.75system 0:07.56elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k
compiled with gcc-3.1 and -O2 -march=pentium3:
4.62user 1.85system 0:06.46elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k
4.59user 1.89system 0:06.47elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k
4.64user 1.81system 0:06.44elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k
compiled with gcc-3.1 and -O2 -march=pentium3 -fomit-frame-pointer:
4.41user 1.84system 0:06.24elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k
4.39user 1.86system 0:06.25elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
4.52user 1.73system 0:06.23elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k
compiled with gcc-3.1 and all of the above + -mfpmath=sse:
4.23user 1.86system 0:06.08elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k
4.23user 1.87system 0:06.08elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k
4.22user 1.88system 0:06.08elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k
4.23user 1.87system 0:06.10elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k
4.42user 1.67system 0:06.09elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
4.13user 1.94system 0:06.07elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
As can be seen, the cpu-specific -march=pentium3 flag helps noticeably
in this simple benchmark, and the other optimizations each help a
little. (Since gnuplot mostly uses double precision internally, it is
possible -mfpmath=sse would have greater impact when compiled for and
run on a SSE2 system. I have certainly seen it have a _tremendous_
impact on one of my own programs on that PIII box when executing a
tight, single-precision loop, which, of course, are uncommon in most
apps.)
Of course, I have also seen cases where the extra flags do not help or
even hurt. So, my own preference would be for the following:
- CPU detection handled upstream within the sources (ideal),
- else Debian cpu-specific packages _only_ where someone has verified
it really seems to make a significant positive difference and
doesn't cause additional bugs,
and
- ability for users to easily recompile packages from Debian source
with custom compilers and compiler flags.
-ccwf
--
To UNSUBSCRIBE, email to debian-devel-request@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Reply to: