Re: Pgcc in Deb
Drake Diedrich <Drake.Diedrich@anu.edu.au> writes:
> Please post full information when you make claims like this. Code,
> compiler versions and options, /proc/cpuinfo, ...
Here we go. Excerpts from /proc/cpuinfo:
model name : AMD-K7(tm) Processor
cpu MHz : 648.763668
cache size : 512 KB
I ran BYTE's nbench 2.1 (first 10 tests) and ubench 0.3 (tests "CPU"
"MEM"), both from metalab's system/benchmark directory. Then I ran
gzip and bzip2 at full compression against the linux-2.2.14.tar, used
pgp-i_2.6.3a-7 to symmetrically encrypt this tar (with compression
turned off), and finally I encoded 2.5 minutes of music into mp3 with
lame-3.51.
All these were done with binaries resulting from different compiler
switches (detailed list below). Each result is an average over three
runs. The *bench tests (uppercase) have number of iterations as
results (higher=faster), the program tests have execution times
(wall-clock) (lower=faster).
Tests were conducted in single-user mode, the buffer-cache being
primed with the file in question.
Benchmark \ Try 1 2 3 4 5 6
----------------------------------------------------------------------------
NUMERIC SORT 111.13 318.77 333.08 333.01 452.07 457.35
STRING SORT 32.90 43.46 43.49 43.48 44.78 46.29
BITFIELD 3.12 1.28 1.06 1.10 8.10 7.91
FP EMULATION 8.98 21.52 22.11 28.41 45.27 46.40
FOURIER 6007.63 7931.50 8065.43 8063.30 8195.50 8178.07
ASSIGNMENT 1.45 5.20 3.57 3.61 4.35 4.36
IDEA 406.66 580.86 574.79 892.03 936.76 922.70
HUFFMAN 176.75 308.25 325.80 323.20 348.75 353.68
NEURAL NET 1.56 9.13 9.81 10.15 11.35 11.60
LU DECOMPOSITION 55.60 194.07 304.15 305.75 325.41 337.88
CPU 33151.43 31724.71 31467.76 30845.00 30629.46
MEM 4151.67 3933.00 3661.33 3599.67 3967.67
gzip 39.03 37.41 36.62 37.72 35.66 36.24
bzip2 175.61 104.15 103.27 102.48 100.15 99.77
pgp 58.14 38.68 36.95 36.95 35.36 35.37
lame 94.92 35.41 33.95 34.13 31.01 30.63
Benchmark \ Try 7 8 9 10 11 12
------------------------------------------------------------------------------
NUMERIC SORT 465.03 351.09 350.49 353.04 343.11 332.88
STRING SORT 46.50 46.46 46.47 46.40 43.62 43.64
BITFIELD 1.16 1.28 1.22 1.16 1.14 1.15
FP EMULATION 46.38 32.86 32.86 32.56 28.47 28.85
FOURIER 8211.53 8088.10 7961.57 7967.40 8025.73 7975.37
ASSIGNMENT 4.89 4.55 4.51 4.63 3.69 3.97
IDEA 929.71 900.27 884.51 908.99 899.16 890.26
HUFFMAN 377.64 373.72 373.92 377.23 322.42 321.32
NEURAL NET 11.58 11.62 11.65 11.75 10.06 9.96
LU DECOMPOSITION 331.00 323.20 324.28 322.88 299.11 322.76
CPU 30606.58 30633.00 30626.00 30630.00 31432.08 31433.13
MEM 3892.00 3639.33 3226.33 4340.67 3709.67 2501.00
gzip 36.97 39.89 39.31 37.36 38.46 37.22
bzip2 100.16 98.49 99.24 99.53 102.15 101.53
pgp 35.28 34.68 35.81 34.89 37.39 37.40
lame 31.24 31.05 30.96 31.25 33.64 33.60
Results relative to try 4, best try marked
Benchmark \ Try 1 2 3 4 5 6
----------------------------------------------------------
NUMERIC SORT 0.33 0.96 1.00 1.00 1.36 1.37
STRING SORT 0.76 1.00 1.00 1.00 1.03 1.06
BITFIELD 2.84 1.16 0.96 1.00 *7.37* 7.20
FP EMULATION 0.32 0.76 0.78 1.00 1.59 *1.63*
FOURIER 0.75 0.98 1.00 1.00 1.02 1.01
ASSIGNMENT 0.40 *1.44* 0.99 1.00 1.20 1.21
IDEA 0.46 0.65 0.64 1.00 *1.05* 1.03
HUFFMAN 0.55 0.95 1.01 1.00 1.08 1.09
NEURAL NET 0.15 0.90 0.97 1.00 1.12 1.14
LU DECOMPOSITION 0.18 0.63 0.99 1.00 1.06 *1.11*
CPU *1.05* 1.01 1.00 0.98 0.97
MEM 1.13 1.07 1.00 0.98 1.08
gzip 1.03 0.99 0.97 1.00 *0.95* 0.96
bzip2 1.71 1.02 1.01 1.00 0.98 0.97
pgp 1.57 1.05 1.00 1.00 0.96 0.96
lame 2.78 1.04 0.99 1.00 0.91 *0.90*
Benchmark \ Try 7 8 9 10 11 12
------------------------------------------------------------
NUMERIC SORT *1.40* 1.05 1.05 1.06 1.03 1.00
STRING SORT *1.07* 1.07 1.07 1.07 1.00 1.00
BITFIELD 1.06 1.16 1.11 1.05 1.04 1.04
FP EMULATION 1.63 1.16 1.16 1.15 1.00 1.02
FOURIER *1.02* 1.00 0.99 0.99 1.00 0.99
ASSIGNMENT 1.36 1.26 1.25 1.28 1.02 1.10
IDEA 1.04 1.01 0.99 1.02 1.01 1.00
HUFFMAN *1.17* 1.16 1.16 1.17 1.00 0.99
NEURAL NET 1.14 1.14 1.15 *1.16* 0.99 0.98
LU DECOMPOSITION 1.08 1.06 1.06 1.06 0.98 1.06
CPU 0.97 0.97 0.97 0.97 1.00 1.00
MEM 1.06 0.99 0.88 *1.19* 1.01 0.68
gzip 0.98 1.06 1.04 0.99 1.02 0.99
bzip2 0.98 *0.96* 0.97 0.97 1.00 0.99
pgp 0.95 *0.94* 0.97 0.94 1.01 1.01
lame 0.92 0.91 0.91 0.92 0.99 0.98
Try 4 is the one with the most cpu-independent optimization, hence
it's used as reference. Tries 1-3 are lower optimization settings,
with 1 being the extreme do-not-ever-use-this "-O0".
I consider the *bench-tests as not extremely relevant to our problem,
but it is instructive to see the many pessimizations in those tests.
The four application benchmarks are more to the point, and give more
conclusive results.
>From these you can see that 10 % speedup is possible in some cases,
and about 5 % the norm for CPU-bound programs.
Of course, gcc does not include Athlon-specific optimizations. It
would be interesting to run these tests on a pure Pentium Pro
platform, or similar ones on other CPUs directly supported by gcc.
Compiler switches:
1:-s -static -O0
2:-s -static -O1
3:-s -static -O2
4:-s -static -O3
5:-s -static -O3 -fomit-frame-pointer -Wall -mpentiumpro
-march=pentiumpro -fforce-addr -fforce-mem -malign-loops=2
-malign-functions=4 -malign-jumps=2 -funroll-loops
-fexpensive-optimizations -malign-double -fschedule-insns2
-mwide-multiply
6:-s -static -O3 -fomit-frame-pointer -Wall -mpentiumpro
-march=pentiumpro -fforce-addr -fforce-mem -malign-functions=4
-funroll-loops -fexpensive-optimizations -malign-double
-fschedule-insns2 -mwide-multiply
7:-s -static -O3 -fomit-frame-pointer -Wall -mpentiumpro
-march=pentiumpro -malign-functions=4 -funroll-loops
-fexpensive-optimizations -malign-double -fschedule-insns2
-mwide-multiply
8:-s -static -O3 -fomit-frame-pointer -Wall -mpentium
-march=pentium -malign-functions=4 -funroll-loops
-fexpensive-optimizations -malign-double -fschedule-insns2
-mwide-multiply
9:-s -static -O3 -fomit-frame-pointer -Wall -m486
-malign-functions=4 -funroll-loops -fexpensive-optimizations
-malign-double -fschedule-insns2 -mwide-multiply
10:-s -static -O3 -fomit-frame-pointer -Wall -m386
-malign-functions=4 -funroll-loops -fexpensive-optimizations
-malign-double -fschedule-insns2 -mwide-multiply
11:-s -static -O6 -mpentium
12:-s -static -O6 -mpentiumpro
--
Robbe
Reply to: