Re: SIGFPE and -mieee
Personally I don't consider a SIGFPE under non-IEEE conformance an indicator
of buggy software, and I'm going to share my rant about why! *grin*
About a year ago I was coding up some Monte Carlo simulations on our local
Beowulf cluster, and I kept having this problem where about every 100th time
I ran my code it would mysteriously SIGFPE. I wasted a great deal of time
looking for the 'bug' in my code, only to discover that I was being hit by
-mieee.
I was calculating an error correction term, and under the right draws in the
Monte Carlo simulation the error term would approach zero. This lead to a
vary small possibility (about 1 in 10000000) of the math library (the
standard C erf function in this case) generating a denormalized number.
Once a denormalized number has been generated in a program, it's sunk. The
Alpha does a SIGFPE as soon as you load it up in a register (you can't even
do a comparison -- to detect it -- without generating a SIGFPE). The only
solution (besides linking against the Compaq math libraries -- which don't
generate denormalized numbers) is to recompile with -mieee.
At the time I considered it a bug in the GNU math libraries. I figured they
should add some code to detect if -mieee is in operation and only generate
denormalized numbers if that is the case. I wrote some emails to the
appropriate list and basically got told that if the FPU wasn't IEEE compliant
then that wasn't their problem.
I now realize, however, that what I was trying to purpose isn't even possible.
-mieee actually causes the compiler to set a bit in the opcode of the
floating point instructions it generates. There is no global bit in a FP
control register somewhere that you can check at runtime.
-mieee applies instruction by instruction. It's fully possible to compile
your code with -mieee, link against a library that someone else has compiled
without -mieee, and boom, have your program crash just because you pass a
denormalized number to the library. Even worse, there is no way to crash
gracefully as you don't even have precise exception handling for the FPU
without -mieee (well, you can, but you loose the speed advantage of not
having -mieee).
So, what am I trying to say? I guess just that I disagree. Because not
specifying -mieee causes a program to foul up on a more than just division by
zero (more specifically, in the denormalization region just before an
underflow -- which arises very naturally in perfectly non buggy software), I
think -mieee should be the norm.
Later -T
PS: Peronsally I consider it a bug in GCC. Optimizations that break
perfectly reasonable code should be off by default (how does a -nomieee flag
sound *grin*)! A new user (or casual package maintainers) should not be
expected to be familiar with all the esoteric architecture specific options
for the compiler. Architecture specific speed tweaking options should only
have to be of interest to the hardcore speed demons.
PPS: I've focused on denormalization in this email, because it is the one
that would broadside most standard apps (I would suspect that is what is
happening to both mpg321 and Xine). However, the rest of the IEEE standard
is quite natural and makes perfect sense under the correct (non-esoteric)
circumstances as well.
For example, when you are working with sums from distributions, it makes
perfect sense for a positive underflow to go to positive zero, and then a
division by the positive zero to go to infinity, and then the infinity to go
back to positive zero again (in something like exp(-x/b) with x>0 and b->0+).
It is an indication of a term that does not contribute anything of
significance to the sum, not buggy software.
--
Tyson Whitehead (-twhitehe@uwo.ca -- WSC-)
Computer Engineer Dept. of Applied Mathematics,
Graduate Student- Applied Mathematics University of Western Ontario,
GnuPG Key ID# 0x8A2AB5D8 London, Ontario, Canada
Reply to: