[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: PPC64: gcc currently compiles for power4 by default, causing glibc's sqrtf to fail on e6500



On Sat, Feb 10, 2018 at 04:02:36PM -0500, Dennis Clarke wrote:
> On 09/02/18 05:34 AM, John Paul Adrian Glaubitz wrote:
> > On 02/09/2018 11:30 AM, Bas Vermeulen wrote:
> > > mator on #debian-ports compiled gcc-7 for me with the attached patch.
> > > With the resulting gcc, I compiled glibc and got a library I can use
> > > sqrtf without running into an illegal instruction exception.
> > > 
> > > Would it be possible to get this applied by default? The resulting
> > > binaries work on e6500, and ought to work on all supported CPUs
> > > for the ppc64 port.
> > 
> > This is something that needs to be discussed. A single user alone shouldn't
> > warrant such major change in a port. You always have to keep in mind that
> > changing the default compiler options also has potential impact on the
> > performance on more modern ppc64 systems like Apple Macintosh.
> 
> 
> Not sure how modern an Apple Mac is but here is a photo I took only a
>  few minutes ago:
> 
>     https://i.imgur.com/6UbviKb.jpg
> 
> 
> I have this old Mac G5 running as a fine example of a big-endian machine
> and the PPC970MP processors in it seem to work very well. However it is
> certainly becoming difficult to get results from it that can compare to
> what I get from some other machines like Fujitsu SPARC for example. The
> biggest complaint is with floating point wherein the data representation
> may be actual IEEE 754-2008 style or some new IBM variant that I am not
> at all familiar with. In fact, some code, trivial, won't compile at all
> if I try to use "IEEE extended precision long double" with very few ways
> to get around that :
> 
> gcc -mcpu=970 -mno-altivec -m64 -std=iso9899:1999 -Wfatal-errors \
>    -pedantic-errors -mabi=ieeelongdouble  ...
> 
> The gcc that I am using claims to be :
> 
>   GNU C99 (Debian 7.2.0-17) version 7.2.1 20171205 (powerpc64-linux-gnu)
>         compiled by GNU C version 7.2.1 20171205, GMP version 6.1.2,
>          MPFR version 3.1.6, MPC version 1.0.3, isl version isl-0.18-GMP
> 
> 
> I can take the exact same source of a trivial floating point test and
> drop it on very very old sparc as well as a system running very up to
> date Red Hat Enterprise Linux 7.4 with AMD Opterons.  Also this old mac
> g5 with its PPC970MP processors where I see wildly different results on
> all of them.  When I say "wildly" I mean to say that the in memory data
> isn't even remotely the same given the same constant inputs. I know that
> the x86 hardware is somewhat crippled ( a strange ten byte format ) in
> this regard but I was quite surprised by what happens on the PPC970MP
> processors when compared to sparc.  Regardless what compiler I use on
> the sparc ( very very old Sun and much newer Fujitsu ) with Solaris 10
> I always get nearly perfect results. The Debian PPC970MP produces close
> results but again the in memory data is quite different.
> 
> In any case there are people out there messing with these things for
> various reasons ( educational even in that I do teach ) and it is quite
> weird to have to say to a student that in the year 2018 don't expect
> similar results across different machines when it comes to doing any
> floating point math.
> 
> Dennis
> 
> ps: long boring stuff follows where numbers don't quite work
>      and libquadmath seems to be out of the question.

This is quite well known, for a long time, IBM on Power (not on
mainframes) used a non IEEE format for long doubles. Actually these are
two IEEE doubles "concatenated", so:
- the mantissa is somewhat less precise, 2 times 53 bits instead of 112
- the exponent range is way smaller, in powers of 10 the range is
  roughly ±308 (same as double) instead of ±4932.

The fact the the in memory representation is completely different is not
surprising when you take this into account.

This was somewhat faster than a full emulation of IEEE quad math, but
now IBM has switched to real IEEE quad (in hardware even on Power9, I
suspect most Sparc do it in software). 

For more details, you may have a look at:
https://en.wikipedia.org/wiki/Quadruple-precision_floating-point_format
there is even a full paragraph on the double-double arithmetic.

I'm away from my Power machine right now and it is switched off, so I
can't try your code and play with compiler options.

	Cheers,
	Gabriel

> 
> ----- feel free to compile this on anything and show results ------
> 
> #define _XOPEN_SOURCE 600
> 
> #include <stdio.h>
> #include <stdlib.h>
> #include <math.h>
> #include <locale.h>
> #include <sys/utsname.h>
> 
> int main (int argc, char* argv[]){
> 
>     int j;
>     struct utsname uname_data;
>     long double theta, pi, approx_pi, one_over_sqrt2, ld_error;
> 
>     setlocale( LC_MESSAGES, "C" );
>     if ( uname( &uname_data ) < 0 ) {
>         fprintf ( stderr,
>                  "WARNING : Could not attain system uname data.\n" );
>         perror ( "uname" );
>     } else {
>         printf ("        system name = %s\n", uname_data.sysname );
>         printf ("          node name = %s\n", uname_data.nodename );
>         printf ("            release = %s\n", uname_data.release );
>         printf ("            version = %s\n", uname_data.version );
>         printf ("            machine = %s\n", uname_data.machine );
>     }
>     printf ("\n");
> 
>     /* plenty of digits well past the precision of binary128 */
>     pi = 3.1415926535897932384626433832795028841971693993751L;
> 
>     printf("sizeof(long double) = %2i\n", sizeof(long double));
>     printf("      pi may be %+40.38Lf\n", pi);
>     printf("reference val = ");
>     printf("+3.1415926535897932384626433832795028841971693993751\n\n");
> 
>     printf("%p : ", &pi);
>     for ( j=0; j<sizeof(long double); j++ )
>         printf("%02x ", ((unsigned char *)&pi)[j] );
>     printf("\n\n" );
> 
>     ld_error = (long double)
>                    3.1415926535897932384626433832795028841971693993751L
>                    - pi;
>     printf("     ld_error = %+40.38Lf\n\n", ld_error);
> 
>     printf("sinl(pi) may be %+40.38Lf\n", sinl(pi));
> 
>     approx_pi = (long double) 4.0L * atanl( (long double) 1.0L);
>     printf("    approx_pi = %+40.38Lf\n", approx_pi);
>     ld_error = (long double)
>                    3.1415926535897932384626433832795028841971693993751L
>                    - approx_pi;
> 
>     printf("     ld_error = %+40.38Lf\n\n", ld_error);
> 
>     theta = pi / ( (long double) 4.0L);
>     printf("        theta = %+40.38Lf\n", theta);
>     one_over_sqrt2 = sinl(theta);
>     printf("  sinl(theta) = %+40.38Lf\n", one_over_sqrt2);
> 
>     ld_error = (long double)
>                    0.7071067811865475244008443621048490392848359376884L
>                    - one_over_sqrt2;
> 
>     printf("     ld_error = %+40.38Lf\n\n", ld_error);
> 
>     return EXIT_SUCCESS;
> 
> }
> 
> EOF
> If you copy and paste that correctly you should have sha256 hash :
> 
>     836282023b62d3a09b6ad59424951d873b965a594f23e6c41d596c4845f74d5d
> 
> dc@n0$ psrinfo -pv
> The physical processor has 8 virtual processors (0-7)
>   SPARC64-VII+ (portid 1024 impl 0x7 ver 0xa1 clock 2860 MHz)
> dc@n0$ /usr/local/gcc6/bin/gcc --version
> gcc (genunix Wed Jul 26 02:41:24 GMT 2017) 6.4.0
> Copyright (C) 2017 Free Software Foundation, Inc.
> This is free software; see the source for copying conditions.  There is NO
> warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
> 
> dc@n0$ /usr/local/gcc6/bin/gcc -m64 -std=iso9899:1999 -Wfatal-errors
> -pedantic-errors -o s s.c -lm
> dc@n0$ ./s
>         system name = SunOS
>           node name = node000
>             release = 5.10
>             version = Generic_150400-59
>             machine = sun4u
> 
> sizeof(long double) = 16
>       pi may be +3.14159265358979323846264338327950279748
> reference val = +3.1415926535897932384626433832795028841971693993751
> 
> ffffffff7fffeed0 : 40 00 92 1f b5 44 42 d1 84 69 89 8c c5 17 01 b8
> 
>      ld_error = +0.00000000000000000000000000000000000000
> 
> sinl(pi) may be +0.00000000000000000000000000000000008672
>     approx_pi = +3.14159265358979323846264338327950279748
>      ld_error = +0.00000000000000000000000000000000000000
> 
>         theta = +0.78539816339744830961566084581987569937
>   sinl(theta) = +0.70710678118654752440084436210484899217
>      ld_error = +0.00000000000000000000000000000000000000
> 
> 
> however ....
> 
> ppc_nix$
> ppc_nix$ gcc --version
> gcc (Debian 7.2.0-17) 7.2.1 20171205
> Copyright (C) 2017 Free Software Foundation, Inc.
> This is free software; see the source for copying conditions.  There is NO
> warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
> 
> ppc_nix$ grep "^cpu" /proc/cpuinfo
> cpu             : PPC970MP, altivec supported
> cpu             : PPC970MP, altivec supported
> cpu             : PPC970MP, altivec supported
> cpu             : PPC970MP, altivec supported
> ppc_nix$
> 
> ppc_nix$ openssl dgst -sha256 s.c
> SHA256(s.c)=
> 836282023b62d3a09b6ad59424951d873b965a594f23e6c41d596c4845f74d5d
> 
> ppc_nix$ gcc -mcpu=970 -mno-altivec -m64 -std=iso9899:1999 -Wfatal-errors
> -pedantic-errors -mabi=ieeelongdouble -o s s.c -lm
> gcc: warning: using IEEE extended precision long double
> cc1: warning: using IEEE extended precision long double
> /tmp/cc348kuM.o: In function `main':
> s.c:(.text+0x26c): undefined reference to `_q_sub'
> s.c:(.text+0x3ac): undefined reference to `_q_sub'
> s.c:(.text+0x424): undefined reference to `_q_div'
> s.c:(.text+0x4ec): undefined reference to `_q_sub'
> collect2: error: ld returned 1 exit status
> ppc_nix$
> 
> ppc_nix$ gcc -mcpu=970 -mno-altivec -m64 -std=iso9899:1999 -Wfatal-errors
> -pedantic-errors -mabi=ibmlongdouble -o s s.c -lm
> gcc: warning: using IBM extended precision long double
> cc1: warning: using IBM extended precision long double
> ppc_nix$ ./s
>         system name = Linux
>           node name = nix
>             release = 4.13.0-1-powerpc64
>             version = #1 SMP Debian 4.13.13-1 (2017-11-16)
>             machine = ppc64
> 
> sizeof(long double) = 16
>       pi may be +3.14159265358979323846264338327948122706
> reference val = +3.1415926535897932384626433832795028841971693993751
> 
> 0x7fffc9d0c230 : 40 09 21 fb 54 44 2d 18 3c a1 a6 26 33 14 5c 06
> 
>      ld_error = +0.00000000000000000000000000000000000000
> 
> sinl(pi) may be +0.00000000000000000000000000000002165713
>     approx_pi = +3.14159265358979323846264338327948122706
>      ld_error = +0.00000000000000000000000000000000000000
> 
>         theta = +0.78539816339744830961566084581987030677
>   sinl(theta) = +0.70710678118654752440084436210483464400
>      ld_error = +0.00000000000000000000000000000000616298
> 
> ppc_nix$
> 
> 
> A twenty year old sparc gives better results when using gcc 7.2.0 :
> 
> mimas $ psrinfo -pv
> The physical processor has 1 virtual processor (0)
>   UltraSPARC-IIe (portid 0 impl 0x13 ver 0x14 clock 500 MHz)
> 
> mimas $ /usr/local/gcc7/bin/gcc --version
> gcc (genunix Tue Aug 29 11:48:17 GMT 2017) 7.2.0
> Copyright (C) 2017 Free Software Foundation, Inc.
> This is free software; see the source for copying conditions.  There is NO
> warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
> 
> mimas $
> 
> mimas $ openssl dgst -sha256 s.c
> SHA256(s.c)=
> 836282023b62d3a09b6ad59424951d873b965a594f23e6c41d596c4845f74d5d
> 
> mimas $ /usr/local/gcc7/bin/gcc -m64 -std=iso9899:1999 -Wfatal-errors
> -pedantic-errors -o s s.c -lm
> mimas $ ./s
>         system name = SunOS
>           node name = mimas
>             release = 5.10
>             version = Generic_150400-57
>             machine = sun4u
> 
> sizeof(long double) = 16
>       pi may be +3.14159265358979323846264338327950279748
> reference val = +3.1415926535897932384626433832795028841971693993751
> 
> ffffffff7ffff0a0 : 40 00 92 1f b5 44 42 d1 84 69 89 8c c5 17 01 b8
> 
>      ld_error = +0.00000000000000000000000000000000000000
> 
> sinl(pi) may be +0.00000000000000000000000000000000008672
>     approx_pi = +3.14159265358979323846264338327950279748
>      ld_error = +0.00000000000000000000000000000000000000
> 
>         theta = +0.78539816339744830961566084581987569937
>   sinl(theta) = +0.70710678118654752440084436210484899217
>      ld_error = +0.00000000000000000000000000000000000000
> 
> mimas $
> 
> Other than the memory address this is bit for bit exact same as the
> newer Fujitsu server. I was hoping to see the exact same from the
> mac PPC970MP based unit.
> 


Reply to: