[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: PPC64: gcc currently compiles for power4 by default, causing glibc's sqrtf to fail on e6500



On 09/02/18 05:34 AM, John Paul Adrian Glaubitz wrote:
On 02/09/2018 11:30 AM, Bas Vermeulen wrote:
mator on #debian-ports compiled gcc-7 for me with the attached patch.
With the resulting gcc, I compiled glibc and got a library I can use
sqrtf without running into an illegal instruction exception.

Would it be possible to get this applied by default? The resulting
binaries work on e6500, and ought to work on all supported CPUs
for the ppc64 port.

This is something that needs to be discussed. A single user alone shouldn't
warrant such major change in a port. You always have to keep in mind that
changing the default compiler options also has potential impact on the
performance on more modern ppc64 systems like Apple Macintosh.


Not sure how modern an Apple Mac is but here is a photo I took only a
 few minutes ago:

    https://i.imgur.com/6UbviKb.jpg


I have this old Mac G5 running as a fine example of a big-endian machine
and the PPC970MP processors in it seem to work very well. However it is
certainly becoming difficult to get results from it that can compare to
what I get from some other machines like Fujitsu SPARC for example. The
biggest complaint is with floating point wherein the data representation
may be actual IEEE 754-2008 style or some new IBM variant that I am not
at all familiar with. In fact, some code, trivial, won't compile at all
if I try to use "IEEE extended precision long double" with very few ways
to get around that :

gcc -mcpu=970 -mno-altivec -m64 -std=iso9899:1999 -Wfatal-errors \
   -pedantic-errors -mabi=ieeelongdouble  ...

The gcc that I am using claims to be :

  GNU C99 (Debian 7.2.0-17) version 7.2.1 20171205 (powerpc64-linux-gnu)
        compiled by GNU C version 7.2.1 20171205, GMP version 6.1.2,
         MPFR version 3.1.6, MPC version 1.0.3, isl version isl-0.18-GMP


I can take the exact same source of a trivial floating point test and
drop it on very very old sparc as well as a system running very up to
date Red Hat Enterprise Linux 7.4 with AMD Opterons.  Also this old mac
g5 with its PPC970MP processors where I see wildly different results on
all of them.  When I say "wildly" I mean to say that the in memory data
isn't even remotely the same given the same constant inputs. I know that
the x86 hardware is somewhat crippled ( a strange ten byte format ) in
this regard but I was quite surprised by what happens on the PPC970MP
processors when compared to sparc.  Regardless what compiler I use on
the sparc ( very very old Sun and much newer Fujitsu ) with Solaris 10
I always get nearly perfect results. The Debian PPC970MP produces close
results but again the in memory data is quite different.

In any case there are people out there messing with these things for
various reasons ( educational even in that I do teach ) and it is quite
weird to have to say to a student that in the year 2018 don't expect
similar results across different machines when it comes to doing any
floating point math.

Dennis

ps: long boring stuff follows where numbers don't quite work
     and libquadmath seems to be out of the question.

----- feel free to compile this on anything and show results ------

#define _XOPEN_SOURCE 600

#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <locale.h>
#include <sys/utsname.h>

int main (int argc, char* argv[]){

    int j;
    struct utsname uname_data;
    long double theta, pi, approx_pi, one_over_sqrt2, ld_error;

    setlocale( LC_MESSAGES, "C" );
    if ( uname( &uname_data ) < 0 ) {
        fprintf ( stderr,
                 "WARNING : Could not attain system uname data.\n" );
        perror ( "uname" );
    } else {
        printf ("        system name = %s\n", uname_data.sysname );
        printf ("          node name = %s\n", uname_data.nodename );
        printf ("            release = %s\n", uname_data.release );
        printf ("            version = %s\n", uname_data.version );
        printf ("            machine = %s\n", uname_data.machine );
    }
    printf ("\n");

    /* plenty of digits well past the precision of binary128 */
    pi = 3.1415926535897932384626433832795028841971693993751L;

    printf("sizeof(long double) = %2i\n", sizeof(long double));
    printf("      pi may be %+40.38Lf\n", pi);
    printf("reference val = ");
    printf("+3.1415926535897932384626433832795028841971693993751\n\n");

    printf("%p : ", &pi);
    for ( j=0; j<sizeof(long double); j++ )
        printf("%02x ", ((unsigned char *)&pi)[j] );
    printf("\n\n" );

    ld_error = (long double)
                   3.1415926535897932384626433832795028841971693993751L
                   - pi;
    printf("     ld_error = %+40.38Lf\n\n", ld_error);

    printf("sinl(pi) may be %+40.38Lf\n", sinl(pi));

    approx_pi = (long double) 4.0L * atanl( (long double) 1.0L);
    printf("    approx_pi = %+40.38Lf\n", approx_pi);
    ld_error = (long double)
                   3.1415926535897932384626433832795028841971693993751L
                   - approx_pi;

    printf("     ld_error = %+40.38Lf\n\n", ld_error);

    theta = pi / ( (long double) 4.0L);
    printf("        theta = %+40.38Lf\n", theta);
    one_over_sqrt2 = sinl(theta);
    printf("  sinl(theta) = %+40.38Lf\n", one_over_sqrt2);

    ld_error = (long double)
                   0.7071067811865475244008443621048490392848359376884L
                   - one_over_sqrt2;

    printf("     ld_error = %+40.38Lf\n\n", ld_error);

    return EXIT_SUCCESS;

}

EOF
If you copy and paste that correctly you should have sha256 hash :

    836282023b62d3a09b6ad59424951d873b965a594f23e6c41d596c4845f74d5d

dc@n0$ psrinfo -pv
The physical processor has 8 virtual processors (0-7)
  SPARC64-VII+ (portid 1024 impl 0x7 ver 0xa1 clock 2860 MHz)
dc@n0$ /usr/local/gcc6/bin/gcc --version
gcc (genunix Wed Jul 26 02:41:24 GMT 2017) 6.4.0
Copyright (C) 2017 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

dc@n0$ /usr/local/gcc6/bin/gcc -m64 -std=iso9899:1999 -Wfatal-errors -pedantic-errors -o s s.c -lm
dc@n0$ ./s
        system name = SunOS
          node name = node000
            release = 5.10
            version = Generic_150400-59
            machine = sun4u

sizeof(long double) = 16
      pi may be +3.14159265358979323846264338327950279748
reference val = +3.1415926535897932384626433832795028841971693993751

ffffffff7fffeed0 : 40 00 92 1f b5 44 42 d1 84 69 89 8c c5 17 01 b8

     ld_error = +0.00000000000000000000000000000000000000

sinl(pi) may be +0.00000000000000000000000000000000008672
    approx_pi = +3.14159265358979323846264338327950279748
     ld_error = +0.00000000000000000000000000000000000000

        theta = +0.78539816339744830961566084581987569937
  sinl(theta) = +0.70710678118654752440084436210484899217
     ld_error = +0.00000000000000000000000000000000000000


however ....

ppc_nix$
ppc_nix$ gcc --version
gcc (Debian 7.2.0-17) 7.2.1 20171205
Copyright (C) 2017 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

ppc_nix$ grep "^cpu" /proc/cpuinfo
cpu             : PPC970MP, altivec supported
cpu             : PPC970MP, altivec supported
cpu             : PPC970MP, altivec supported
cpu             : PPC970MP, altivec supported
ppc_nix$

ppc_nix$ openssl dgst -sha256 s.c
SHA256(s.c)= 836282023b62d3a09b6ad59424951d873b965a594f23e6c41d596c4845f74d5d

ppc_nix$ gcc -mcpu=970 -mno-altivec -m64 -std=iso9899:1999 -Wfatal-errors -pedantic-errors -mabi=ieeelongdouble -o s s.c -lm
gcc: warning: using IEEE extended precision long double
cc1: warning: using IEEE extended precision long double
/tmp/cc348kuM.o: In function `main':
s.c:(.text+0x26c): undefined reference to `_q_sub'
s.c:(.text+0x3ac): undefined reference to `_q_sub'
s.c:(.text+0x424): undefined reference to `_q_div'
s.c:(.text+0x4ec): undefined reference to `_q_sub'
collect2: error: ld returned 1 exit status
ppc_nix$

ppc_nix$ gcc -mcpu=970 -mno-altivec -m64 -std=iso9899:1999 -Wfatal-errors -pedantic-errors -mabi=ibmlongdouble -o s s.c -lm
gcc: warning: using IBM extended precision long double
cc1: warning: using IBM extended precision long double
ppc_nix$ ./s
        system name = Linux
          node name = nix
            release = 4.13.0-1-powerpc64
            version = #1 SMP Debian 4.13.13-1 (2017-11-16)
            machine = ppc64

sizeof(long double) = 16
      pi may be +3.14159265358979323846264338327948122706
reference val = +3.1415926535897932384626433832795028841971693993751

0x7fffc9d0c230 : 40 09 21 fb 54 44 2d 18 3c a1 a6 26 33 14 5c 06

     ld_error = +0.00000000000000000000000000000000000000

sinl(pi) may be +0.00000000000000000000000000000002165713
    approx_pi = +3.14159265358979323846264338327948122706
     ld_error = +0.00000000000000000000000000000000000000

        theta = +0.78539816339744830961566084581987030677
  sinl(theta) = +0.70710678118654752440084436210483464400
     ld_error = +0.00000000000000000000000000000000616298

ppc_nix$


A twenty year old sparc gives better results when using gcc 7.2.0 :

mimas $ psrinfo -pv
The physical processor has 1 virtual processor (0)
  UltraSPARC-IIe (portid 0 impl 0x13 ver 0x14 clock 500 MHz)

mimas $ /usr/local/gcc7/bin/gcc --version
gcc (genunix Tue Aug 29 11:48:17 GMT 2017) 7.2.0
Copyright (C) 2017 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

mimas $

mimas $ openssl dgst -sha256 s.c
SHA256(s.c)= 836282023b62d3a09b6ad59424951d873b965a594f23e6c41d596c4845f74d5d

mimas $ /usr/local/gcc7/bin/gcc -m64 -std=iso9899:1999 -Wfatal-errors -pedantic-errors -o s s.c -lm
mimas $ ./s
        system name = SunOS
          node name = mimas
            release = 5.10
            version = Generic_150400-57
            machine = sun4u

sizeof(long double) = 16
      pi may be +3.14159265358979323846264338327950279748
reference val = +3.1415926535897932384626433832795028841971693993751

ffffffff7ffff0a0 : 40 00 92 1f b5 44 42 d1 84 69 89 8c c5 17 01 b8

     ld_error = +0.00000000000000000000000000000000000000

sinl(pi) may be +0.00000000000000000000000000000000008672
    approx_pi = +3.14159265358979323846264338327950279748
     ld_error = +0.00000000000000000000000000000000000000

        theta = +0.78539816339744830961566084581987569937
  sinl(theta) = +0.70710678118654752440084436210484899217
     ld_error = +0.00000000000000000000000000000000000000

mimas $

Other than the memory address this is bit for bit exact same as the
newer Fujitsu server. I was hoping to see the exact same from the
mac PPC970MP based unit.


Reply to: