[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#520877: I do not see the need for a new port !



* Nathael Pajani | 2010-01-08 09:13:29 [+0100]:

>> looks like this is a new port. won't fix in GCC for the powerpc port.
>The e500 core is no new port, it is powerpc it just misses one
>"lwsync" instruction used in libstdc++6 (this seems to be a bug in
>the e500 powerpc core)
It is not just the lwsync thing. The gnuspe port also uses SPE for
floating point because the "normal" FPU unit is not available and has to
be emulated in kernel.

>Some have created a new "unofficial" debian port for e500 cores (see
>[0]), but I think a small modification to the build system would fix
>the problem for the main powerpc port.
Sure. Replacing lwsync with sync would make work on e500 cores and it
will be little slower on those machines which have lwsync implemented
(64bit server boxes).

>I managed to have the official port working on e500 by manually
>changing the lwsync opcode in the libstdc++6 binary, using the sync
>(or msync ?) opcode.
>It's used only once.
>And maybe even only in this library.
This may be the case for the gcc package. Than we have other packages
which require atomic updates and will suffer. This would be boehm-gc for
instance which is included in a few packages like gcj. Another one would
be liburcu.

>So preventing the use of this particular instruction in the gcc
>package and in the gcc used to build the port will not be a
>performance issue for the other cores.
If there would be no performance difference between lwsync and sync I
doubt that the Power consortium would introduce this opcode and name it
light weight sync.

>Of course, another solution is to add a trap in the kernel for this
>instruction and perform a sync instead of the lwsync (see [1])
So you are trapping for every floating point instruction into the kernel
and now even for the lwsync opcode. This is something that would work
but will end up slower than necessary even for the e500 cores since the
whole floating point code is emulated in software. So it is a solution
to get software work in first place but not a final solution.
This will probably make no difference for an embedded board working as a
web server but sure make a difference if you use it lets say as your
desktop with multimedia applications.

>The problem is that I do not have any more e500 cores at hand, so I
>cannot test or spend more time for investigations.
Okay. So here is something for you: I've spent some time and let nbench
2.2.3 [0] run on a e500 and here are the results:

- the new port, nbench compiled with gcc 4.3 and -O3
|BYTEmark* Native Mode Benchmark ver. 2 (10/95)
|Index-split by Andrew D. Balsa (11/97)
|Linux/Unix* port by Uwe F. Mayer (12/96,11/97)
|
|TEST                : Iterations/sec.  : Old Index   : New Index
|                    :                  : Pentium 90* : AMD K6/233*
|--------------------:------------------:-------------:------------
|NUMERIC SORT        :           642.6  :      16.48  :       5.41
|STRING SORT         :          55.356  :      24.73  :       3.83
|BITFIELD            :      1.4017e+08  :      24.04  :       5.02
|FP EMULATION        :          125.36  :      60.15  :      13.88
|FOURIER             :          3968.9  :       4.51  :       2.54
|ASSIGNMENT          :          10.245  :      38.98  :      10.11
|IDEA                :          1921.8  :      29.39  :       8.73
|HUFFMAN             :          1027.7  :      28.50  :       9.10
|NEURAL NET          :         0.70771  :       1.14  :       0.48
|LU DECOMPOSITION    :          19.491  :       1.01  :       0.73
|==========================ORIGINAL BYTEMARK RESULTS==========================
|INTEGER INDEX       : 29.459
|FLOATING-POINT INDEX: 1.730
|Baseline (MSDOS*)   : Pentium* 90, 256 KB L2-cache, Watcom* compiler 10.0
|==============================LINUX DATA BELOW===============================
|CPU                 : 
|L2 Cache            : 
|OS                  : Linux 2.6.31.6-00383-gc419d4b
|C compiler          : gcc version 4.3.2 (Debian 4.3.2-1.1) 
|libc                : 
|MEMORY INDEX        : 5.793
|INTEGER INDEX       : 8.789
|FLOATING-POINT INDEX: 0.960
|Baseline (LINUX)    : AMD K6/233*, 512 KB L2-cache, gcc 2.7.2.3, libc-5.4.38
|* Trademarks are property of their respective holder.
|

- the new port, nbench compiled with gcc 4.3 and -O3 -mfloat-gprs=double
|TEST                : Iterations/sec.  : Old Index   : New Index
|                    :                  : Pentium 90* : AMD K6/233*
|--------------------:------------------:-------------:------------
|NUMERIC SORT        :          668.28  :      17.14  :       5.63
|STRING SORT         :          55.294  :      24.71  :       3.82
|BITFIELD            :      1.4042e+08  :      24.09  :       5.03
|FP EMULATION        :          125.37  :      60.16  :      13.88
|FOURIER             :          5197.2  :       5.91  :       3.32
|ASSIGNMENT          :           9.996  :      38.04  :       9.87
|IDEA                :          1920.9  :      29.38  :       8.72
|HUFFMAN             :            1018  :      28.23  :       9.01
|NEURAL NET          :          9.1451  :      14.69  :       6.18
|LU DECOMPOSITION    :          288.64  :      14.95  :      10.80
|==========================ORIGINAL BYTEMARK RESULTS==========================
|INTEGER INDEX       : 29.481
|FLOATING-POINT INDEX: 10.909
|Baseline (MSDOS*)   : Pentium* 90, 256 KB L2-cache, Watcom* compiler 10.0
|==============================LINUX DATA BELOW===============================
|CPU                 : 
|L2 Cache            : 
|OS                  : Linux 2.6.31.6-00383-gc419d4b
|C compiler          : gcc version 4.3.2 (Debian 4.3.2-1.1) 
|libc                : 
|MEMORY INDEX        : 5.747
|INTEGER INDEX       : 8.853
|FLOATING-POINT INDEX: 6.051
|Baseline (LINUX)    : AMD K6/233*, 512 KB L2-cache, gcc 2.7.2.3, libc-5.4.38
|* Trademarks are property of their respective holder.

 As you can see, integer only code like "NUMERIC SORT" or "IDEA" didn't
 change but code which relies heavily on floating point mostly type
 double improved.

- powerpc etch port, nbench compiled with cross gcc 4.3 and with -O3
  -static. I've removed the "ASSIGNMENT" because it did not complete
  after three hours.

|TEST                : Iterations/sec.  : Old Index   : New Index
|                    :                  : Pentium 90* : AMD K6/233*
|--------------------:------------------:-------------:------------
|NUMERIC SORT        :          659.68  :      16.92  :       5.56
|STRING SORT         :          55.072  :      24.61  :       3.81
|BITFIELD            :      1.3981e+08  :      23.98  :       5.01
|FP EMULATION        :           119.6  :      57.39  :      13.24
|FOURIER             :          52.355  :       0.06  :       0.03
|IDEA                :          1884.2  :      28.82  :       8.56
|HUFFMAN             :          58.916  :       1.63  :       0.52
|NEURAL NET          :        0.054789  :       0.09  :       0.04
|LU DECOMPOSITION    :          1.8248  :       0.09  :       0.07
|==========================ORIGINAL BYTEMARK RESULTS==========================
|INTEGER INDEX       : 11.523
|FLOATING-POINT INDEX: 0.079
|Baseline (MSDOS*)   : Pentium* 90, 256 KB L2-cache, Watcom* compiler 10.0
|==============================LINUX DATA BELOW===============================
|CPU                 : 
|L2 Cache            : 
|OS                  : Linux 2.6.28
|C compiler          : powerpc-linux-gnu-gcc
|libc                : static
|MEMORY INDEX        : 2.672
|INTEGER INDEX       : 4.257
|FLOATING-POINT INDEX: 0.044
|Baseline (LINUX)    : AMD K6/233*, 512 KB L2-cache, gcc 2.7.2.3, libc-5.4.38
|* Trademarks are property of their respective holder.

 As you can see everything what remained steady between run one and two
 is also almost the same here. Everything what improved (floating point)
 got very slow in this run. The reason for this is the floating point
 emulation in kernel.

- powerpc etch port, nbench compiled native with etch's gcc 4.1 and with
  -O3. Here the "ASSIGNMENT" test completes.

|TEST                : Iterations/sec.  : Old Index   : New Index
|                    :                  : Pentium 90* : AMD K6/233*
|--------------------:------------------:-------------:------------
|NUMERIC SORT        :          609.24  :      15.62  :       5.13
|STRING SORT         :          58.466  :      26.12  :       4.04
|BITFIELD            :      1.4093e+08  :      24.17  :       5.05
|FP EMULATION        :          131.84  :      63.26  :      14.60
|FOURIER             :           47.37  :       0.05  :       0.03
|ASSIGNMENT          :          10.445  :      39.75  :      10.31
|IDEA                :          2060.7  :      31.52  :       9.36
|HUFFMAN             :          131.63  :       3.65  :       1.17
|NEURAL NET          :        0.054705  :       0.09  :       0.04
|LU DECOMPOSITION    :           1.872  :       0.10  :       0.07
|==========================ORIGINAL BYTEMARK RESULTS==========================
|INTEGER INDEX       : 22.428
|FLOATING-POINT INDEX: 0.077
|Baseline (MSDOS*)   : Pentium* 90, 256 KB L2-cache, Watcom* compiler 10.0
|==============================LINUX DATA BELOW===============================
|CPU                 : 
|L2 Cache            : 
|OS                  : Linux 2.6.28
|C compiler          : gcc version 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)
|libc                : libc-2.3.6.so
|MEMORY INDEX        : 5.949
|INTEGER INDEX       : 5.346
|FLOATING-POINT INDEX: 0.043
|Baseline (LINUX)    : AMD K6/233*, 512 KB L2-cache, gcc 2.7.2.3, libc-5.4.38
|* Trademarks are property of their respective holder.

 The integer code improves a little or gets worse a little. This is
 probably different code generated by gcc 4.1 and 4.3. Floating point
 code is still slow.

Ach btw, the cpu was:
|cpu             : e500v2
|clock           : 1249.987505MHz
|revision        : 3.0 (pvr 8021 0030)

>Thanks.
>Have fun :)
>+++
[0] http://www.tux.org/~mayer/linux/bmark.html

Sebastian



Reply to: