Bug#418795: gcc-4.1: profiling (-pg) FPU bound application using -funsafe-math-optimizations yields incorrect timing
tag 418795 + moreinfo
thanks
please recheck with gcc-4.2 and gcc-4.3/gcc-snapshot
Wouter Koolen-Wijkstra writes:
> Package: gcc-4.1
> Version: 4.1.1-21
> Severity: normal
>
>
> I am profiling an application that performs a lot of log2(.) operations. I wanted to use
> -funsafe-math-optimizations to have log2 executed by the FPU directly, instead of using
> a function call to the c library. Strangely enough, although wall-clock time and CPU time
> go down, the number of seconds that gprof reports now roughly quadruples, exceeding the
> total CPU time by a factor of two.
>
> The attached c++ file reproduces the problem. I get
>
> ** run without -funsafe-math-optimizations **
> g++ -pg test.cpp && time ./a.out && gprof ./a.out
> 8.01531e+09
>
> real 0m42.432s
> user 0m42.143s
> sys 0m0.148s
> Flat profile:
>
> Each sample counts as 0.01 seconds.
> % cumulative self self total
> time seconds seconds calls Ts/call Ts/call name
> 100.00 13.11 13.11 main
> 0.00 13.11 0.00 1 0.00 0.00 global constructors keyed to main
> 0.00 13.11 0.00 1 0.00 0.00 __static_initialization_and_destruction_0(int, int)
>
> ** run with -funsafe-math-optimizations **
> g++ -funsafe-math-optimizations -pg test.cpp && time ./a.out && gprof ./a.out
> 8.01531e+09
>
> real 0m30.165s
> user 0m29.898s
> sys 0m0.060s
> Flat profile:
>
> Each sample counts as 0.01 seconds.
> % cumulative self self total
> time seconds seconds calls Ts/call Ts/call name
> 100.00 74.87 74.87 main
> 0.00 74.87 0.00 1 0.00 0.00 global constructors keyed to main
> 0.00 74.87 0.00 1 0.00 0.00 __static_initialization_and_destruction_0(int, int)
>
>
> Notice that when enabling -funsafe-math-optimizations
> - wall clock time goes down from 43 to 31 sec.
> - gprof time goes up from 14 to 75 sec.
> - it is strange that
> * without -funsafe-math-optimizations the gprof time (14 sec) is much lower than wall clock time (43 sec)
> * with -funsafe-math-optimizations the gprof time (75 sec) is much higher than wall clock time (31 sec)
>
> the first (*) can be explained by samples that occur in the log2 system call.
> the second (*) can not be correct. I suspect that the instrumentation code that gcc inserts into the
> compiled program expects the FPU to be in a specific IEEE/ansi state, which it is not when the
> -funsafe-math-optimizations flag is used.
>
> With kind regards,
>
> Wouter Koolen-Wijkstra
>
>
> -- System Information:
> Debian Release: lenny/sid
> APT prefers testing
> APT policy: (990, 'testing'), (500, 'unstable'), (500, 'stable')
> Architecture: i386 (i686)
>
> Kernel: Linux 2.6.18v10 (SMP w/2 CPU cores; PREEMPT)
> Locale: LANG=en_US, LC_CTYPE=en_US (charmap=ISO-8859-1)
> Shell: /bin/sh linked to /bin/bash
>
> Versions of packages gcc-4.1 depends on:
> ii binutils 2.17-3 The GNU assembler, linker and bina
> ii cpp-4.1 4.1.1-21 The GNU C preprocessor
> ii gcc-4.1-base 4.1.1-21 The GNU Compiler Collection (base
> ii libc6 2.3.6.ds1-13 GNU C Library: Shared libraries
> ii libgcc1 1:4.1.1-21 GCC support library
> ii libssp0 4.1.1-21 GCC stack smashing protection libr
>
> Versions of packages gcc-4.1 recommends:
> ii libc6-dev 2.3.6.ds1-13 GNU C Library: Development Librari
> pn libmudflap0-dev <none> (no description available)
>
> -- no debconf information
> #include <cmath>
> #include <iostream>
> using namespace std;
>
> int main(int argc, const char ** argv) {
> // do something that takes noticeable time (~30 sec)
> // and that does many floating point operations
> double d = 0;
> for (int i = 1; i < 300000000; i++) {
> d += log2(i);
> }
> cout << d << endl;
> return 0;
> }
Reply to: