Turbo Fredriksson wrote:
Ok, I have actually done a little test program. The problem is that optimizing an application with gcc 3.3 might actually make things *slower*. I have a P4 (2.6GHz) and an K7 2000+ workstation. When I run a simple program that only does,It's been discussed before, but I never bothered to look into that, since I didn't think I cared :) Now, a friend of mine took the time to (more or less) manually recompile/ repackage/reinstall his Debian GNU/Linux woody box. His word was "it WAY faster!". No numbers, just those three words. Got'a mean something... If I remember correctly from the last time I saw this subject come up, the resolution was "no, we don't want that". I can't remember why, but I don't care (unless it's changed :).
start_time = time(); for( int i=0; i<10000000; i++ ) // I forget how many 0 I had exactly sin ( i / 324.34444 ); end_time = time(); printf( "Diff: %d\n", end_time - start_time );When I run the program with no optimization -O0, I got some numbers that looked about the same for both machines. Then I compiled with -march=pentium-4 and -march=athlon-xp and -O6, and surprise, surprise, the P4 had a speed *decrease*!! It slowed down from something like 23 seconds to 25 seconds. I don't know the exact numbers right now since this was about a month ago, but optimizing things for P4 using gcc is not the best thing. It is much better to just least things with i386 or i486, and just use -O6 or whatever. There is a reason why linux uses assembly for speed critical parts of the kernel!
If someone wants, I can post some numbers from a simple test like above tomorrow.
- AdamPS. I know for a fact that optimizing things with SSE or SSE2 *slows down* the program if compiled with gcc. This happed for both Athlon and P4 when I tried to optimize a loop-heavy magic square program.
Optimizing things with the Intel's C++ compiler might produce drastically different results, but that is not the point here :)