Re: MIT discovered issue with gcc

To: debian-security@lists.debian.org
Subject: Re: MIT discovered issue with gcc
From: Octavio Alvarez <alvarezp@alvarezp.ods.org>
Date: Thu, 28 Nov 2013 14:41:45 -0800
Message-id: <[🔎] 5297C6A9.2000102@alvarezp.ods.org>
In-reply-to: <[🔎] 20131128180226.GA1937@client.brlink.eu>
References: <[🔎] 52900522.9040507@affinityvision.com.au> <[🔎] 52900637.1080600@affinityvision.com.au> <[🔎] CAAr43iO_4L7+ViL8VqzPZro+fDm1VhpPHePOmp88hiwbn+FWXg@mail.gmail.com> <[🔎] CAA6oTfZWZ-L_gGXMeUnJEH0f299Wy6hmSG3YnqqXy6HKWPywuw@mail.gmail.com> <[🔎] 20131125221007.GE9862@hysteria.proulx.com> <[🔎] 7f8d5e92-56c6-11e3-a86f-001cc0cda50c@msgid.mathom.us> <[🔎] 5294EE82.8050502@meetinghouse.net> <[🔎] CAAy1gkcoGdU=p2Ck7XjtBP6UVZcUm3ZoTsP3Q37y2aweRyeadA@mail.gmail.com> <[🔎] 529653DF.6010406@alvarezp.ods.org> <[🔎] 20131128180226.GA1937@client.brlink.eu>

On 28/11/13 10:02, Bernhard R. Link wrote:

In brief: Debian should call for compilation with -O0 for all packagescompiled with GCC where maintainers won't take responsibility for theprogram not falling into UB ever.


See below for the full comments.

* Octavio Alvarez <alvarezp@alvarezp.ods.org> [131127 21:28]:

On 26/11/13 11:37, Mark Haase wrote:

Compiler developers, for better or worse, reserve the right to do
whatever they want with undefined behavior, and it's up to the person
writing the C code to not include undefined behavior in their own program.


That's a fallacy. The fact that a compiler does not violate the
standard does not imply it is behaving sane. Thus, not violating the
standard does not imply not having a bug.


"Not violating the standard means no bug" would indeed be a fallacy.
But that is a different statement that what you replied to.

The first part, "compiler developers, for better or worse, reserve theright to do whatever they want with undefined behavior", implies thatthe compiler takes no responsibility for code that falls under UB andshould feel free to do either sane or insane things because it is notviolating the standard.

This is reinforced with the second part, "it's up to the person writingthe C code to not include undefined behavior in their own program",which lets developers just push bug reports back to the reporters.

So, in overall, the phrase is implying that it is valid to make unsafeassumptions and, when reported as bugs, just push the problem back tothe developers. Not only that, but it promotes this practice. This isthe relevance of noting the fallacy.


And this is a bad, bad problem.

Considering a programmer would not ever *ever* want to fall into
undefined behavior, the compiler should just issue warnings before
making any kind of assumptions based after undefined behavior. Those
warnings could be silenced with flags. This is a way of "yes, I'm
sure of what I'm doing".


The whole point of undefined behaviour is this type of warnings are not
possible. And if such warnings would be possible, that kind of code
would better produce an error than making that behaviour undefined.

You are mixing two things. Yes, it's the developer responsibility tomake correct programs, but no, the compiler should not make the mistakesworse.


Consider this case:

$ cat zero.c
int main(void) {
	int a = 4;
	int b = a + 7;
	int c = a - b + 7;
	if (3/c > 4)
		return 5;
	return 10;
}

What is the *expected* outcome of this program? From the point of viewof the C standard there is no expected outcome. The developer clearlymade a mistake because c == 0 by line 5, but the coder is expecting theCPU to execute the instruction 3/c.


However, GCC takes this away even with just -O1:

$ gcc -O0 -std=c99 -Wall -Wextra -pedantic-errors -ggdb -o zero zero.c
$ ./zero; echo $?
Floating point exception (core dumped)
136

$ gcc -O1 -std=c99 -Wall -Wextra -pedantic-errors -ggdb -o zero zero.c
$ ./zero; echo $?
10

On -O1, the compiler is optimizing everything away to "return 10".

$ objdump -D zero | grep -A 5 '<main>'
00000000004004ec <main>:
  4004ec:	b8 0a 00 00 00       	mov    $0xa,%eax
  4004f1:	c3                   	retq
  4004f2:	66 2e 0f 1f 84 00 00 	nopw   %cs:0x0(%rax,%rax,1)
  4004f9:	00 00 00
  4004fc:	0f 1f 40 00          	nopl   0x0(%rax)

This is incorrect. Why? The compiler is taking the assumption that thecomparison 3/x > 4 (with x being integer) always evaluate to false.

But "3/x > 4, with x being integer" does not always evaluate to false.There is the case, x == 0, where 3/x is undefined. This proves thecompiler is optimizing based on blind assumptions. This should be a bug,but developers will say it's not because x/0 is UB. Saying so is thefallacy I'm talking about.

IOW, there should be an optimization option, say, -O or -O1, where atleast we as programmers could rely on the compiler making optimizationsONLY within the confines of defined behavior. Make that the default andlet -OU as an option to add those UB-based crazy assumptions forfast-as-light-but-wrong code.


In this other case (I changed "a" to be != 4):

$ cat zero2.c
int main(void) {
	int a = 5;
	int b = a + 7;
	int c = a - b + 7;
	if (3/c > 4)
		return 5;
	return 10;
}

$ gcc -O0 -std=c99 -Wall -Wextra -pedantic-errors -ggdb -o zero2 zero2.c
$ ./zero; echo $?
10

$ gcc -O1 -std=c99 -Wall -Wextra -pedantic-errors -ggdb -o zero2 zero2.c
$ ./zero; echo $?
10

$ objdump -D zero2 | grep -A 5 '<main>'
00000000004004ec <main>:
  4004ec:	b8 0a 00 00 00       	mov    $0xa,%eax
  4004f1:	c3                   	retq
  4004f2:	66 2e 0f 1f 84 00 00 	nopw   %cs:0x0(%rax,%rax,1)
  4004f9:	00 00 00
  4004fc:	0f 1f 40 00          	nopl   0x0(%rax)

The optimizer is still making a wrong assumption. The result being 10 isjust a coincidence.


If, and only if, the compiler works through this deduction:

C must be != 0
a is 4
4 - b + 7 != 0
4 - b != -7
a != 4
In this code a == 5, so c != 0

Only then the optimizer can feel free to optimize the "if" section away.

In this function (where I changed "a" to be a parameter):

int f(int a) {
	int b = a + 7;
	int c = a - b + 7;
	if (3/c > 4)
		return 5;
	return 10;
}

The optimizer should NOT take the division away, but here:

int f(int a) {
	if (a == 4)
		return 0;
	int b = a + 7;
	int c = a - b + 7;
	if (3/c > 4)
		return 5;
	return 10;
}

... then the compiler can take the division away.

As an idea, and just thinking out loud, it could be a bad idea, howabout this:


#undef NDEBUG
int f(int a) {
	assert(a == 4);
	int b = a + 7;
	int c = a - b + 7;
	if (3/c > 4)
		return 5;
	return 10;
}

Note that NDEBUG is undefined, effectively not resulting in any actualcode, but the compiler could take it as an optimization hint.

Undefined behaviour is especially reserved for anything where the
compiler can on all forseeable platforms, produce fast code in the
usual case, but strange things can happen in the undefined case.

If this were optional (like -OU) I'd be fine with it. I would at leasthave an option to have safe compilation. I will show an example below.

Let's take the example of signed overflow:

Almost every operation with numbers has some chance for overflow,
depending on the numbers. While for unsigned numbers the result
is quite easy to define (just calculate and go module 2^wordsize)
and processors come with instructions doing that efficiently
(or it is easy and fast to emulate them). But with signed numbers,
just defining some specific behaviour means severly limiting the
compiler what processor instructions to use. It might even mean
that a compiler would have to generate a conditional for every
operation with signed numbers to just test first if it would
overflow.

I'm not saying that. The compiler defining a safe behavior is just asbad. That is also a fallacy.

It just means that the compiler should not optimize away. It is stillthe developer's responsibility to not fall into UB.

So this little trick of making the behaviour undefined
can speed up the programs a compiler generates a significant
amount of time.

So allowing some operations that are valid or invalid depending
on what input it is given is already needed if you want fast
programs.

I don't want fast programs. I want programs that do exactly what I said.Once that premise is fulfilled, *then* I want fast programs (or small,if I'm on embedded).

Next the compiler has to decide what to do with things like

void foo(int s, int e) {
         int i;
         for (i = s ; i != e ; i++) {
                 something();
         }
}

Given that writing programs that have signed integers overflow
is already forbidden (by making them undefined), the compiler
can translate this to:

DO e-s TIMES
         CALL something

I see your point, let me say that. But no, the compiler shouldn't dothat because it does not have enough information. It would just be awrong compiler decision.


I would, though, find it valid in the following code:

void foo(int s, int e) {
         if (s >= e))
	 	 return;
         int i;
         for (i = s ; i != e ; i++) {
                 something();
         }
}

but if you want to take away the conditional:

void foo(int s, int e) {
         assert(s < e);
         int i;
         for (i = s ; i != e ; i++) {
                 something();
         }
}

Even if NDEBUG is not defined. Or something similar, like a conditionalcomment:


void foo(int s, int e) {
         /* GCC_ASSERT(s < e) */
         int i;
         for (i = s ; i != e ; i++) {
                 something();
         }
}

... or whatever hint is included.

Without using the information about undefined behaviour, it would
have to translate it to:

IF s < e
         DO e-s TIMES
                 CALL something
ELSE
         DO range(int)-(s-e) TIMES
                 CALL something

or if e was a long:

IF s < e
         DO e-s TIMES
                 CALL something
ELSE IF e <= MAXINT
         DO range(int)-(s-e) TIMES
                 CALL something
ELSE
         ENDLESS LOOP
                 CALL something


Both of which are wrong. It should map to whatever gcc -O0 gives:

00000000004004f7 <foo>:
  4004f7:       55                      push   %rbp
  4004f8:       48 89 e5                mov    %rsp,%rbp
  4004fb:       48 83 ec 18             sub    $0x18,%rsp
  4004ff:       89 7d ec                mov    %edi,-0x14(%rbp)
  400502:       89 75 e8                mov    %esi,-0x18(%rbp)
  400505:       8b 45 ec                mov    -0x14(%rbp),%eax
  400508:       89 45 f8                mov    %eax,-0x8(%rbp)
  40050b:       eb 11                   jmp    40051e <foo+0x27>
  40050d:       b8 00 00 00 00          mov    $0x0,%eax
  400512:       e8 d5 ff ff ff          callq  4004ec <something>
  400517:       89 45 fc                mov    %eax,-0x4(%rbp)
  40051a:       83 45 f8 01             addl   $0x1,-0x8(%rbp)
  40051e:       8b 45 f8                mov    -0x8(%rbp),%eax
  400521:       3b 45 e8                cmp    -0x18(%rbp),%eax
  400524:       75 e7                   jne    40050d <foo+0x16>
  400526:       c9                      leaveq
  400527:       c3                      retq

You can see actual call to increment i in "addl $0x1,-0x8(%rbp)" thetest for equality and the jump back to continue the loop. If thehardware fails, then so be it! It was the coder fault for falling intoUB. Leave it to the CPU to make the program fail.

As people want fast programs it makes sense in my eyes to say here:

This is an incorrect assumption. Coders want fast programs *as long asthey exactly map whatever the coded said* (not with what the coderintended). Otherwise the compiler is failing to do what the coder asked for.

In conclusion, the current state of this GCC optimization is nottrustable. Considering the div/0 optimization begins on -O1, then -O1and higer levels are not trustable.

Therefore, Debian should call for compilation with -O0 for all packagescompiled with GCC where maintainers won't take responsibility for theprogram not falling into UB ever.

Reply to:

Follow-Ups:
- Re: MIT discovered issue with gcc
  - From: Octavio Alvarez <alvarezp@alvarezp.ods.org>
- Re: MIT discovered issue with gcc
  - From: Nicolas Rachinsky <deb-security-1@ml.turing-complete.org>

References:
- MIT discovered issue with gcc
  - From: Andrew McGlashan <andrew.mcglashan@affinityvision.com.au>
- Re: MIT discovered issue with gcc
  - From: Andrew McGlashan <andrew.mcglashan@affinityvision.com.au>
- Re: MIT discovered issue with gcc
  - From: Joel Rees <joel.rees@gmail.com>
- Re: MIT discovered issue with gcc
  - From: Robert Baron <robertbartlettbaron@gmail.com>
- Re: MIT discovered issue with gcc
  - From: Bob Proulx <bob@proulx.com>
- Re: MIT discovered issue with gcc
  - From: Michael Stone <mstone@debian.org>
- Re: MIT discovered issue with gcc
  - From: Miles Fidelman <mfidelman@meetinghouse.net>
- Re: MIT discovered issue with gcc
  - From: Mark Haase <mark.haase@lunarline.com>
- Re: MIT discovered issue with gcc
  - From: Octavio Alvarez <alvarezp@alvarezp.ods.org>
- Re: MIT discovered issue with gcc
  - From: "Bernhard R. Link" <brlink@debian.org>

Prev by Date: Re: MIT discovered issue with gcc
Next by Date: Re: MIT discovered issue with gcc
Previous by thread: Re: MIT discovered issue with gcc
Next by thread: Re: MIT discovered issue with gcc
Index(es):
- Date
- Thread