[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#706207: gcc-4.6, gcc-4.7: invalid optimization when doing double -> int math and conversion (on big endian archs(?))



On Fri, Apr 26, 2013 at 01:04:30PM +0200, Ondřej Surý wrote:
> On Fri, Apr 26, 2013 at 12:43 PM, Bastian Blank <waldi@debian.org> wrote:
> > On Fri, Apr 26, 2013 at 12:27:53PM +0200, Ondřej Surý wrote:
> >> This code from libgd2:src/gd.c:clip_1d:
> >>   *y1 -= m * (*x1 - mindim);
> >> where
> >>   m = (double) -0.050000
> >>   *x1 = -200
> >>   mindim = 0
> >>   *y1 = 15
> >> results in *y1 = 4, which is incorrect value, since it should be 5.
> >
> > Nope. The result of "m * (*x1 - mindim)" is not 10, it is a floating
> > point value near 10, as 10 can't be expressed in double. So this is:
> > 15 - 10.00000001 = 4.9999999. This converted to int is 4.
> >
> >> Most simple workaround, which allows gcc to produce correct value:
> >>   *y1 -= (int)(m * (*x1 - mindim));
> >
> > Here you force the later part to be 10.
> >
> >> Assigning to some other variable also works ok:
> >>   int t;
> >>   t = m * (*x1 - mindim);
> >>   *y1 -= t;
> >
> > The same.
> >
> >> gcc-4.7 is unfortunatelly also affected.
> >> I just hope we don't compile the nuclear reactor controls with gcc :)
> >
> > Just don't convert floating point to fixed point.
> 
> I don't object to this, but somehow I fail to grasp the idea that the
> result depends on architecture and optimization level.

It really does, it seems that in this case it depends on the compiler
generating fused multiply accumulate instructions, which happens to be
the case of powerpc, ia64, probably s390 (and coming to x86).
(Note that ia64 is little-endian, except under HP-UX if I remember correctly).

Decomposing your example:
int t1= *x1 - mindim;  /* only integers, exact */
double t2=t1; /* Converted to double, exact for 32 bit int */
double t3=*y1; /* Same */
/* Now if you don't have fused multiply accumulate, the compiler has no choice */
double t4 = m*t2;
double t5 = t3-t4;
*y1 = (int)t5;
/* but if FMA is available, the compiler can merge two operations and get rid of t4 */
double t5=t3-m*t2;
*y1 = (int) t5;

The difference is in the rounding after m*t2, in your case 0.05*200 
rounds to exactly 10 in double precision, but is a actually a bit above 10. 
This is enough to make the result of the FMA a bit below 5 so the conversion
(truncation) to integer will return 4.

> 
> I would expect consistent results, even consistent *bad* results would be ok.

Nope, FMA can change the rules of the game in subtle ways. An easy way
to check for problems is to recompile the code with -mno-fused-madd.

	Gabriel


Reply to: