[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#418795: gcc-4.1: profiling (-pg) FPU bound application using -funsafe-math-optimizations yields incorrect timing



Package: gcc-4.1
Version: 4.1.1-21
Severity: normal


I am profiling an application that performs a lot of log2(.) operations. I wanted to use
-funsafe-math-optimizations to have log2 executed by the FPU directly, instead of using
a function call to the c library. Strangely enough, although wall-clock time and CPU time
go down, the number of seconds that gprof reports now roughly quadruples, exceeding the
total CPU time by a factor of two.

The attached c++ file reproduces the problem. I get

** run without -funsafe-math-optimizations **
g++ -pg test.cpp && time ./a.out && gprof ./a.out
8.01531e+09

real    0m42.432s
user    0m42.143s
sys     0m0.148s
Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total
 time   seconds   seconds    calls  Ts/call  Ts/call  name
100.00     13.11    13.11                             main
  0.00     13.11     0.00        1     0.00     0.00  global constructors keyed to main
  0.00     13.11     0.00        1     0.00     0.00  __static_initialization_and_destruction_0(int, int)          

** run with -funsafe-math-optimizations **
g++ -funsafe-math-optimizations -pg test.cpp && time ./a.out && gprof ./a.out
8.01531e+09

real    0m30.165s
user    0m29.898s
sys     0m0.060s
Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total
 time   seconds   seconds    calls  Ts/call  Ts/call  name
100.00     74.87    74.87                             main
  0.00     74.87     0.00        1     0.00     0.00  global constructors keyed to main
  0.00     74.87     0.00        1     0.00     0.00  __static_initialization_and_destruction_0(int, int)          


Notice that when enabling -funsafe-math-optimizations
- wall clock time goes down from 43 to 31 sec.
- gprof time goes up from 14 to 75 sec.
- it is strange that 
 * without -funsafe-math-optimizations the gprof time (14 sec) is much lower  than wall clock time (43 sec)
 * with    -funsafe-math-optimizations the gprof time (75 sec) is much higher than wall clock time (31 sec)

the first (*) can be explained by samples that occur in the log2 system call.
the second (*) can not be correct. I suspect that the instrumentation code that gcc inserts into the 
compiled program expects the FPU to be in a specific IEEE/ansi state, which it is not when the
-funsafe-math-optimizations flag is used.

With kind regards,

Wouter Koolen-Wijkstra


-- System Information:
Debian Release: lenny/sid
  APT prefers testing
  APT policy: (990, 'testing'), (500, 'unstable'), (500, 'stable')
Architecture: i386 (i686)

Kernel: Linux 2.6.18v10 (SMP w/2 CPU cores; PREEMPT)
Locale: LANG=en_US, LC_CTYPE=en_US (charmap=ISO-8859-1)
Shell: /bin/sh linked to /bin/bash

Versions of packages gcc-4.1 depends on:
ii  binutils                    2.17-3       The GNU assembler, linker and bina
ii  cpp-4.1                     4.1.1-21     The GNU C preprocessor
ii  gcc-4.1-base                4.1.1-21     The GNU Compiler Collection (base 
ii  libc6                       2.3.6.ds1-13 GNU C Library: Shared libraries
ii  libgcc1                     1:4.1.1-21   GCC support library
ii  libssp0                     4.1.1-21     GCC stack smashing protection libr

Versions of packages gcc-4.1 recommends:
ii  libc6-dev                   2.3.6.ds1-13 GNU C Library: Development Librari
pn  libmudflap0-dev             <none>       (no description available)

-- no debconf information
#include <cmath>
#include <iostream>
using namespace std;

int main(int argc, const char ** argv) {
  // do something that takes noticeable time (~30 sec)
  // and that does many floating point operations
  double d = 0;
  for (int i = 1; i < 300000000; i++) {
    d += log2(i);
  }
  cout << d << endl;
  return 0;
}

Reply to: