Reporting on superpage-patch (on 2.4.18)
Hi,
I thought I'd give the superpage-patch from Naohiko Shimizu a try...
You can find his stuff on http://shimizu-lab.dt.u-tokai.ac.jp/lsp.html
Before I state the numbers, first some remarks.
1) I use a Personal Workstation PWS433au, with 576MB mem, 2MB L3-cache
2) I use Debian GNU/Linux - Sid (unstable) with gcc-2.95.4
3) I also have Compaq ccc installed, version 6.2.9.506
4) I ran the tests in single-user-mode, with stock kernel 2.4.18, with
and without the patch for comparison.
5) The actual patch I used was:
(http://shimizu-lab.dt.u-tokai.ac.jp/lsp/super_page-2.4.18_020707-alpha.patch)
6) The programs I tested with are all from Shimizu, and can be found on
his page. The program gemmx.c gave the most surprising results.
7) Every compile (gcc as well as ccc) was done with -funroll-loops -O2,
and in one case (as indicated) with an extra flag -mcpu=ev56.
8) I ran every program 5 times consecutive; I only state the
best/highest value produced in those 5 times, per program, per compiler,
per kernel. Within 5 runs of the same program there's hardly any
difference.
9) Below, 2.4.18 means stock kernel, 2.4.18s means with superpatch
10) I tested a fortran program, trans.f, which I compiled with GNU
Fortran 0.5.25 (20010319). It gave floating exceptions on optimizations
like -O2, so I left that out. The Compaq cfal (f90) package I have
installed told me that was expired beta software and produced link-time
errors; so I left that out, too.
11) The numbers stated with the mem2.c program are the values for MMAP
store stride, MMAP store continuous, BRK store stride, BRK store
continuous.
12) If you have any questions what it actually is that the numbers
indicate, I don't know! Look at the source-code, or ask mr. Shimizu :-)
Now, for the results:
trans.c
gcc 2.4.18 -- 47.31MB/s
gcc 2.4.18s -- 57.54MB/s
ccc 2.4.18 -- 47.17MB/s
ccc 2.4.18s -- 55.86MB/s
trans2.c
gcc 2.4.18 -- 65.68 75.19
gcc 2.4.18s -- 90.42 91.29
ccc 2.4.18 -- 64.19 75.15
ccc 2.4.18s -- 90.48 91.84
trans.f
g77 2.4.18 -- 31.67MB/s
g77 2.4.18s -- 38.22MB/s
mem2.c
gcc 2.4.18 -- 50.26MB/s 48.08MB/s 48.08MB/s 48.63MB/s
gcc 2.4.18s -- 71.13MB/s 67.77MB/s 71.72MB/s 67.72MB/s
ccc 2.4.18 -- 49.36MB/s 50.01MB/s 50.93MB/s 51.94MB/s
ccc 2.4.18s -- 72.33MB/s 73.19MB/s 71.65MS/b 73.12MB/s
So far, we can see that:
1) the superpage-patch works :-)
2) with the above programs, the Compaq ccc compiler usually gives a
_slightly_ better result, but using ccc instead of gcc doesn't gain way
as much as opposed to using the superpage-patch
Now, Shimizu's program gemmx.c, which is a 1000x1000 matrix to matrix
multiplication, it seems is able to take advantage of the
superpage-patch as well, but, performance gains dramatically if the
compiler is allowed to produce code for the specific CPU.
In case of my CPU, a 21164a (also known as EV56), this bit of the source
is relevant:
#define PREFETCHB
#define PAGESIZE 8192
#define CACHESIZE 98304
#define L2CACHESIZE 2097152
#define TLBENTRY 64
So, compare these numbers:
gcc 2.4.18 -- 170 MFLOPS 167 MFLOPS
gcc 2.4.18s -- 184 MFLOPS 180 MFLOPS
gcc-ev56 2.4.18 -- 237 MFLOPS 238 MFLOPS
gcc-ev56 2.4.18s -- 270 MFLOPS 269 MFLOPS
ccc 2.4.18 -- 309 MFLOPS 311 MFLOPS
ccc 2.4.18s -- 378 MFLOPS 380 MFLOPS
gcc-ev56 means that I put an extra compiler flag (-mcpu=ev56) in.
So we can conclude that the superpatch is still effective, but if you
want to produce fast code, much can be gained by correct compiler
directives, or alternative ('native') compilers. It depends on the
'quality' of the source-code and the function of the program if this is
worthwile.
If anyone has suggestions for further testing, let me know.
I for one intend to keep using the superpage-patch! Thank you mr.
Shimizu !
Also, if anyone managed to compile applications with ccc on Linux, I'd
like to hear about them.
Dannis.
--
To UNSUBSCRIBE, email to debian-alpha-request@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Reply to: