Bug#166255: ada/8606: GNAT floating point optimization bug
Geert Bosch <bosch@gnat.com> wrote:
> Indeed the issue here is the standard one of double rounding for
> 64-bit floating point types. There is nothing Ada-specific about
> this and these problems are identical in C.
That's not the issue I was trying to raise in my bug report. Unless
I have misunderstood the standard, double rounding is not an issue
for Ada; the standard allows it even when the implementation supports
the numerics annex.
Double rounding is an issue for C. My understanding is that the
body of the standard allows it, but Annex F prohibits it if the
implementation defines __STDC_IEC_559__. Gcc 3.2.3 does define
__STDC_IEC_559__ for x86, so it does not comply with the C standard.
The issue I was raising was a different one: The value of a variable
changing over time even though it has not been assigned to. The code
I included used the value of the variable named "guess" twice, and
got two different values. This is prohibited by the Ada standard.
You are correct that there is nothing Ada-specific about this; the
same behavior occurs when the code is translated into C. I've
included the translated code at the end of this message.
> The -ffloat-store
> option may improve repeatability, but has two undesired effects:
> it degrades performance and it always causes double rounding,
> which can degrade accuracy.
The -ffloat-store option does cause the problem to go away, but you
wouldn't guess it from the documentation, which reads:
Do not store floating point variables in registers, and inhibit
other options that might change whether a floating point value is
taken from a register or memory.
This option prevents undesirable excess precision on machines such
as the 68000 where the floating registers (of the 68881) keep more
precision than a `double' is supposed to have. Similarly for the
x86 architecture. For most programs, the excess precision does
only good, but a few programs rely on the precise definition of
IEEE floating point. Use `-ffloat-store' for such programs, after
modifying them to store all pertinent intermediate computations
into variables.
The intent seems to be that when this option is turned off, the value
of a floating point variable may be held in a register rather than in
memory, resulting in it having extra precision. There is no indica-
tion that turning it off might cause the value of a variable to change
over time.
The reference to programs that rely on "the precise definition of IEEE
floating point" is misleading. Because of the double rounding issue,
gcc doesn't do IEEE floating point on the x86 architecture, and probably
doesn't do it on the 68000 line either, regardless of the setting of
this option. Therefore, it is more relevant to mention that storing
a variable with excess precision is prohibited by the language standards
for Ada, C, and probably just about every other front end that gcc
supports.
The -ffloat-store option does, as you state, degrade performance. For
example, the following subroutine compiles to 17 instructions when
compiled with only the -O option. This includes a store/load pair
required to convert the value assigned to x to 64 bits. If compiled
with both the -O and the -ffloat-store options, 32 instructions are
generated.
double
f(double a, double b) {
double x;
x = (long double)a + (long double)b;
if (x < 0.0)
x = -x;
return 2.0 * x;
}
Note the conversions to long double in the above code. No code is
generated for these conversions, but they have the effect of telling
the compiler that the result of the plus operator is a long double.
As a result, the compiler correctly generates code to convert the
result of the addition operation to double.
The bug in the compiler is that if the conversions to long double
are omitted, the compiler will still perform a long double addition,
but won't convert the result to double.
Possible fixes:
1. Add a target description macro that indicates that the RTL binary
floating point operations always produce a long double result.
(The result of unary minus still has the same type as its operand.)
When generating RTL code, if this mode is on, then the results of
binary operators are always stored in extended float mode pseudo-
registers. When a function call or assignment is performed, a type
conversion must be generated if the register holding the argument
value or the value to be assigned does not have the same type as
the formal paramter or assignment target.
2. Actually perform floating point operations in the requested
precision. This requires changing the floating point control word,
which is not a particularly fast operation. (The value must come
from memory, rather than being encoded in the instruction.) The
current calling convention on the x86 is that the floating point
unit is set to 80 bit mode on entry to and on exit from a subroutine,
which is not ideal for this approach.
3. Make -ffloat-store the default on platforms that require it. This
fixes the problem without any major changes to the compiler, but
produces inefficient code. The inefficiency could be partly
addressed by having -funsafe-math-optimizations turn -ffloat-store
off.
4. Do nothing. I list this possibility only because the responses I have
seen appear to endorse it. I reject it; in my view code generation
bugs should be considered high priority. In this case, the symptom
of the bug was that my program went into an infinite loop. Naturally
the first thing I looked for was a logic error in my program.
Fortunately the program happened to be quite small. Statistically
code generation bugs are more likely to be triggered by large programs,
simply because large programs require more code to be generated.
Even once one suspects that the compiler is not implementing the
semantics specified by the language standard, isolating the
problem may not be easy. The semantic rule being violated by GNAT
is the rule stating that the value of a variable shall not change
unless a language construct that changes the value of the variable
is executed. The violation of this rule happened to cause an
infinite loop, but it is such a fundamental rule that violating
it could result in just about any behavior imaginable. Thus the
observed behavior of the program doesn't provide any immediate clue
about what is wrong.
Looking at the assembly language code is a big help, but only
if the person who encounters the bug happens to know the assembly
language of the machine they are using.
In short, code generation bugs can be extremely painful to track
down. They should therefore be fixed.
Of these four possibilities, I favor number 1. Number 3 should be
considered as a stopgap measure. Number 2 is really proposing a new
feature rather than just a bug fix; I'll leave it to Java users to
argue for it. And I hope I've made a convincing case that number 4
should not be considered an option.
--------------- C version of code from original bug report ---------------
#include <stdio.h>
double next_value(double, double, double, double);
int
main(int argc, char **argv) {
double guess;
guess = next_value(9.07611839107694429, 9.07611839107696916,
-2.22044604925031308E-16, 1.96231919602496419E-14);
return 0;
}
double
next_value(double guess_low, double guess_high, double error_low, double error_high) {
double ratio;
double guess;
ratio = -error_low / (error_high - error_low);
if (ratio > 1.0 || ratio < 0.0) {
printf("bad ratio %.10f, error_high = %.10f, error_low = %.10f\n",
ratio, error_high, error_low);
}
ratio = ratio * 0.99 + 0.005;
guess = guess_low + ratio * (guess_high - guess_low);
if (guess == guess_low) {
puts("First test: Guess = Guess_Low");
} else {
puts("First test: Guess /= Guess_Low");
}
if (guess == guess_low) {
puts("Second test: Guess = Guess_Low");
} else {
puts("Second test: Guess /= Guess_Low");
}
}
Reply to: