[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#706207: gcc-4.6, gcc-4.7: invalid optimization when doing double -> int math and conversion (on big endian archs(?))



Hi Gabriel,

On Sat, Apr 27, 2013 at 5:14 PM, Gabriel Paubert <paubert@iram.es> wrote:
> On Fri, Apr 26, 2013 at 01:04:30PM +0200, Ondřej Surý wrote:
>> On Fri, Apr 26, 2013 at 12:43 PM, Bastian Blank <waldi@debian.org> wrote:
>> > On Fri, Apr 26, 2013 at 12:27:53PM +0200, Ondřej Surý wrote:
>> >> This code from libgd2:src/gd.c:clip_1d:
>> >>   *y1 -= m * (*x1 - mindim);
>> >> where
>> >>   m = (double) -0.050000
>> >>   *x1 = -200
>> >>   mindim = 0
>> >>   *y1 = 15
>> >> results in *y1 = 4, which is incorrect value, since it should be 5.
>> >
>> > Nope. The result of "m * (*x1 - mindim)" is not 10, it is a floating
>> > point value near 10, as 10 can't be expressed in double. So this is:
>> > 15 - 10.00000001 = 4.9999999. This converted to int is 4.
>> >
>> >> Most simple workaround, which allows gcc to produce correct value:
>> >>   *y1 -= (int)(m * (*x1 - mindim));
>> >
>> > Here you force the later part to be 10.
>> >
>> >> Assigning to some other variable also works ok:
>> >>   int t;
>> >>   t = m * (*x1 - mindim);
>> >>   *y1 -= t;
>> >
>> > The same.
>> >
>> >> gcc-4.7 is unfortunatelly also affected.
>> >> I just hope we don't compile the nuclear reactor controls with gcc :)
>> >
>> > Just don't convert floating point to fixed point.
>>
>> I don't object to this, but somehow I fail to grasp the idea that the
>> result depends on architecture and optimization level.
>
> It really does, it seems that in this case it depends on the compiler
> generating fused multiply accumulate instructions, which happens to be
> the case of powerpc, ia64, probably s390 (and coming to x86).
> (Note that ia64 is little-endian, except under HP-UX if I remember correctly).
>
> Decomposing your example:
> int t1= *x1 - mindim;  /* only integers, exact */
> double t2=t1; /* Converted to double, exact for 32 bit int */
> double t3=*y1; /* Same */
> /* Now if you don't have fused multiply accumulate, the compiler has no choice */
> double t4 = m*t2;
> double t5 = t3-t4;
> *y1 = (int)t5;
> /* but if FMA is available, the compiler can merge two operations and get rid of t4 */
> double t5=t3-m*t2;
> *y1 = (int) t5;
>
> The difference is in the rounding after m*t2, in your case 0.05*200
> rounds to exactly 10 in double precision, but is a actually a bit above 10.
> This is enough to make the result of the FMA a bit below 5 so the conversion
> (truncation) to integer will return 4.

Thanks for the breakdown.

>>
>> I would expect consistent results, even consistent *bad* results would be ok.
>
> Nope, FMA can change the rules of the game in subtle ways. An easy way
> to check for problems is to recompile the code with -mno-fused-madd.

I understand the problem now more deeply, Bastian's comments helped
and also thanks Mathias to pointing to PR323.

I still think this is crazy from programmers viewpoint, but I
understand it's not gcc fault, and it cannot be fixed.

Ondrej

--
Ondřej Surý <ondrej@sury.org>


Reply to: