[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Pgcc in Deb



Drake Diedrich <Drake.Diedrich@anu.edu.au> writes:

>    Please post full information when you make claims like this.  Code,
> compiler versions and options, /proc/cpuinfo, ...

Here we go. Excerpts from /proc/cpuinfo:

model name	: AMD-K7(tm) Processor
cpu MHz		: 648.763668
cache size	: 512 KB

I ran BYTE's nbench 2.1 (first 10 tests) and ubench 0.3 (tests "CPU"
"MEM"), both from metalab's system/benchmark directory. Then I ran
gzip and bzip2 at full compression against the linux-2.2.14.tar, used
pgp-i_2.6.3a-7 to symmetrically encrypt this tar (with compression
turned off), and finally I encoded 2.5 minutes of music into mp3 with
lame-3.51.

All these were done with binaries resulting from different compiler
switches (detailed list below). Each result is an average over three
runs. The *bench tests (uppercase) have number of iterations as
results (higher=faster), the program tests have execution times
(wall-clock) (lower=faster).

Tests were conducted in single-user mode, the buffer-cache being
primed with the file in question.

Benchmark \ Try          1         2         3         4         5         6
----------------------------------------------------------------------------
NUMERIC SORT        111.13    318.77    333.08    333.01    452.07    457.35
STRING SORT          32.90     43.46     43.49     43.48     44.78     46.29
BITFIELD              3.12      1.28      1.06      1.10      8.10      7.91
FP EMULATION          8.98     21.52     22.11     28.41     45.27     46.40
FOURIER            6007.63   7931.50   8065.43   8063.30   8195.50   8178.07
ASSIGNMENT            1.45      5.20      3.57      3.61      4.35      4.36
IDEA                406.66    580.86    574.79    892.03    936.76    922.70
HUFFMAN             176.75    308.25    325.80    323.20    348.75    353.68
NEURAL NET            1.56      9.13      9.81     10.15     11.35     11.60
LU DECOMPOSITION     55.60    194.07    304.15    305.75    325.41    337.88
CPU                         33151.43  31724.71  31467.76  30845.00  30629.46
MEM                          4151.67   3933.00   3661.33   3599.67   3967.67
gzip                 39.03     37.41     36.62     37.72     35.66     36.24
bzip2               175.61    104.15    103.27    102.48    100.15     99.77
pgp                  58.14     38.68     36.95     36.95     35.36     35.37
lame                 94.92     35.41     33.95     34.13     31.01     30.63
                  
Benchmark \ Try            7         8         9        10        11        12
------------------------------------------------------------------------------
NUMERIC SORT          465.03    351.09    350.49    353.04    343.11    332.88
STRING SORT            46.50     46.46     46.47     46.40     43.62     43.64
BITFIELD                1.16      1.28      1.22      1.16      1.14      1.15
FP EMULATION           46.38     32.86     32.86     32.56     28.47     28.85
FOURIER              8211.53   8088.10   7961.57   7967.40   8025.73   7975.37
ASSIGNMENT              4.89      4.55      4.51      4.63      3.69      3.97
IDEA                  929.71    900.27    884.51    908.99    899.16    890.26
HUFFMAN               377.64    373.72    373.92    377.23    322.42    321.32
NEURAL NET             11.58     11.62     11.65     11.75     10.06      9.96
LU DECOMPOSITION      331.00    323.20    324.28    322.88    299.11    322.76
CPU                 30606.58  30633.00  30626.00  30630.00  31432.08  31433.13
MEM                  3892.00   3639.33   3226.33   4340.67   3709.67   2501.00
gzip                   36.97     39.89     39.31     37.36     38.46     37.22
bzip2                 100.16     98.49     99.24     99.53    102.15    101.53
pgp                    35.28     34.68     35.81     34.89     37.39     37.40
lame                   31.24     31.05     30.96     31.25     33.64     33.60

Results relative to try 4, best try marked

Benchmark \ Try      1      2      3      4      5      6 
----------------------------------------------------------
NUMERIC SORT      0.33   0.96   1.00   1.00   1.36   1.37 
STRING SORT       0.76   1.00   1.00   1.00   1.03   1.06 
BITFIELD          2.84   1.16   0.96   1.00  *7.37*  7.20 
FP EMULATION      0.32   0.76   0.78   1.00   1.59  *1.63*
FOURIER           0.75   0.98   1.00   1.00   1.02   1.01 
ASSIGNMENT        0.40  *1.44*  0.99   1.00   1.20   1.21 
IDEA              0.46   0.65   0.64   1.00  *1.05*  1.03 
HUFFMAN           0.55   0.95   1.01   1.00   1.08   1.09 
NEURAL NET        0.15   0.90   0.97   1.00   1.12   1.14 
LU DECOMPOSITION  0.18   0.63   0.99   1.00   1.06  *1.11*
CPU                     *1.05*  1.01   1.00   0.98   0.97 
MEM                      1.13   1.07   1.00   0.98   1.08 
gzip              1.03   0.99   0.97   1.00  *0.95*  0.96 
bzip2             1.71   1.02   1.01   1.00   0.98   0.97 
pgp               1.57   1.05   1.00   1.00   0.96   0.96 
lame              2.78   1.04   0.99   1.00   0.91  *0.90*
                  
Benchmark \ Try        7      8      9     10     11     12
------------------------------------------------------------
NUMERIC SORT       *1.40*  1.05   1.05   1.06   1.03   1.00
STRING SORT        *1.07*  1.07   1.07   1.07   1.00   1.00
BITFIELD            1.06   1.16   1.11   1.05   1.04   1.04
FP EMULATION        1.63   1.16   1.16   1.15   1.00   1.02
FOURIER            *1.02*  1.00   0.99   0.99   1.00   0.99
ASSIGNMENT          1.36   1.26   1.25   1.28   1.02   1.10
IDEA                1.04   1.01   0.99   1.02   1.01   1.00
HUFFMAN            *1.17*  1.16   1.16   1.17   1.00   0.99
NEURAL NET          1.14   1.14   1.15  *1.16*  0.99   0.98
LU DECOMPOSITION    1.08   1.06   1.06   1.06   0.98   1.06
CPU                 0.97   0.97   0.97   0.97   1.00   1.00
MEM                 1.06   0.99   0.88  *1.19*  1.01   0.68
gzip                0.98   1.06   1.04   0.99   1.02   0.99
bzip2               0.98  *0.96*  0.97   0.97   1.00   0.99
pgp                 0.95  *0.94*  0.97   0.94   1.01   1.01
lame                0.92   0.91   0.91   0.92   0.99   0.98

Try 4 is the one with the most cpu-independent optimization, hence
it's used as reference. Tries 1-3 are lower optimization settings,
with 1 being the extreme do-not-ever-use-this "-O0".

I consider the *bench-tests as not extremely relevant to our problem,
but it is instructive to see the many pessimizations in those tests.
The four application benchmarks are more to the point, and give more
conclusive results.

>From these you can see that 10 % speedup is possible in some cases,
and about 5 % the norm for CPU-bound programs.

Of course, gcc does not include Athlon-specific optimizations. It
would be interesting to run these tests on a pure Pentium Pro
platform, or similar ones on other CPUs directly supported by gcc.

Compiler switches:
 1:-s -static -O0
 2:-s -static -O1
 3:-s -static -O2
 4:-s -static -O3
 5:-s -static -O3 -fomit-frame-pointer -Wall -mpentiumpro
   -march=pentiumpro -fforce-addr -fforce-mem -malign-loops=2
   -malign-functions=4 -malign-jumps=2 -funroll-loops
   -fexpensive-optimizations -malign-double -fschedule-insns2
   -mwide-multiply
 6:-s -static -O3 -fomit-frame-pointer -Wall -mpentiumpro
   -march=pentiumpro -fforce-addr -fforce-mem -malign-functions=4
   -funroll-loops -fexpensive-optimizations -malign-double
   -fschedule-insns2 -mwide-multiply
 7:-s -static -O3 -fomit-frame-pointer -Wall -mpentiumpro
   -march=pentiumpro -malign-functions=4 -funroll-loops
   -fexpensive-optimizations -malign-double -fschedule-insns2
   -mwide-multiply
 8:-s -static -O3 -fomit-frame-pointer -Wall -mpentium
   -march=pentium -malign-functions=4 -funroll-loops
   -fexpensive-optimizations -malign-double -fschedule-insns2
   -mwide-multiply
 9:-s -static -O3 -fomit-frame-pointer -Wall -m486
   -malign-functions=4 -funroll-loops -fexpensive-optimizations
   -malign-double -fschedule-insns2 -mwide-multiply
10:-s -static -O3 -fomit-frame-pointer -Wall -m386
  -malign-functions=4 -funroll-loops -fexpensive-optimizations
  -malign-double -fschedule-insns2 -mwide-multiply
11:-s -static -O6 -mpentium
12:-s -static -O6 -mpentiumpro

-- 
Robbe


Reply to: