--- Begin Message ---
- To: Debian Bug Tracking System <submit@bugs.debian.org>
- Subject: gcc-4.1: profiling (-pg) FPU bound application using -funsafe-math-optimizations yields incorrect timing
- From: Wouter Koolen-Wijkstra <wmkoolen@science.uva.nl>
- Date: Thu, 12 Apr 2007 00:14:51 +0200
- Message-id: <20070411221450.21563.15669.reportbug@145.116.6.woningstichting.de.key.invalid>
Package: gcc-4.1
Version: 4.1.1-21
Severity: normal
I am profiling an application that performs a lot of log2(.) operations. I wanted to use
-funsafe-math-optimizations to have log2 executed by the FPU directly, instead of using
a function call to the c library. Strangely enough, although wall-clock time and CPU time
go down, the number of seconds that gprof reports now roughly quadruples, exceeding the
total CPU time by a factor of two.
The attached c++ file reproduces the problem. I get
** run without -funsafe-math-optimizations **
g++ -pg test.cpp && time ./a.out && gprof ./a.out
8.01531e+09
real 0m42.432s
user 0m42.143s
sys 0m0.148s
Flat profile:
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls Ts/call Ts/call name
100.00 13.11 13.11 main
0.00 13.11 0.00 1 0.00 0.00 global constructors keyed to main
0.00 13.11 0.00 1 0.00 0.00 __static_initialization_and_destruction_0(int, int)
** run with -funsafe-math-optimizations **
g++ -funsafe-math-optimizations -pg test.cpp && time ./a.out && gprof ./a.out
8.01531e+09
real 0m30.165s
user 0m29.898s
sys 0m0.060s
Flat profile:
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls Ts/call Ts/call name
100.00 74.87 74.87 main
0.00 74.87 0.00 1 0.00 0.00 global constructors keyed to main
0.00 74.87 0.00 1 0.00 0.00 __static_initialization_and_destruction_0(int, int)
Notice that when enabling -funsafe-math-optimizations
- wall clock time goes down from 43 to 31 sec.
- gprof time goes up from 14 to 75 sec.
- it is strange that
* without -funsafe-math-optimizations the gprof time (14 sec) is much lower than wall clock time (43 sec)
* with -funsafe-math-optimizations the gprof time (75 sec) is much higher than wall clock time (31 sec)
the first (*) can be explained by samples that occur in the log2 system call.
the second (*) can not be correct. I suspect that the instrumentation code that gcc inserts into the
compiled program expects the FPU to be in a specific IEEE/ansi state, which it is not when the
-funsafe-math-optimizations flag is used.
With kind regards,
Wouter Koolen-Wijkstra
-- System Information:
Debian Release: lenny/sid
APT prefers testing
APT policy: (990, 'testing'), (500, 'unstable'), (500, 'stable')
Architecture: i386 (i686)
Kernel: Linux 2.6.18v10 (SMP w/2 CPU cores; PREEMPT)
Locale: LANG=en_US, LC_CTYPE=en_US (charmap=ISO-8859-1)
Shell: /bin/sh linked to /bin/bash
Versions of packages gcc-4.1 depends on:
ii binutils 2.17-3 The GNU assembler, linker and bina
ii cpp-4.1 4.1.1-21 The GNU C preprocessor
ii gcc-4.1-base 4.1.1-21 The GNU Compiler Collection (base
ii libc6 2.3.6.ds1-13 GNU C Library: Shared libraries
ii libgcc1 1:4.1.1-21 GCC support library
ii libssp0 4.1.1-21 GCC stack smashing protection libr
Versions of packages gcc-4.1 recommends:
ii libc6-dev 2.3.6.ds1-13 GNU C Library: Development Librari
pn libmudflap0-dev <none> (no description available)
-- no debconf information
#include <cmath>
#include <iostream>
using namespace std;
int main(int argc, const char ** argv) {
// do something that takes noticeable time (~30 sec)
// and that does many floating point operations
double d = 0;
for (int i = 1; i < 300000000; i++) {
d += log2(i);
}
cout << d << endl;
return 0;
}
--- End Message ---