[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

armhf: gcc violates memory constraints of ARM ABI?



Hi,

on Debian buster for armhf, gcc sometimes produces code that accesses memory up
to 12 bytes below the stack pointer. Isn't this a violation of the official ARM
ABI? An AAPCS document from 2015 says [1]:

> A process may only access (for reading or writing) the closed interval of the
> entire stack delimited by [SP, stack-base - 1]

In a newer version, that constraint was a bit relaxed [2]:

> A process may only store data in the closed interval of the entire stack
> delimited by [SP, stack base - 1]

But still, writing is disallowed.

I've spent many hours to find any documentation of a Linux-specific ABI that
deviates from these standards, without any success.

So I tried to trigger a stack corruption by sending a signal right before such
an memory access. Under certain conditions, when a signal is handled by a
process, the kernel writes data right below the stack. However, it looks like
these conditions are not fulfilled on Debian Buster by default.

More precisely, the kernel may access the last 16 bytes below the stack as some
"retcode" [3][4]. However those bytes are only modified if either sigaction's
SA_RESTORER flag [5] is *not* set [6] or the process is in some specific
execution domain in combination with the config CONFIG_BINFMT_ELF_FDPIC [7].
The latter is not the case by default. Plus, Debian's libc seems to enforce
the SA_RESTORER flag [8][9].

So I'm wondering. Is gcc's generation behavior the result of a conscious
decision based on the described "configurations" of the kernel and libc?
If not, maybe there are somewhere some serious bugs.

Here is the most minimal code example I could create. It consist of three
files:

badlib.h:
struct BadData { char text[14]; };
void dummy(struct BadData bad);
void trigger(int a, struct BadData bad);

badlib.c:
#include "badlib.h"
void dummy(struct BadData bad) {}
void trigger(int a, struct BadData bad) {
    dummy(bad);
}

main.c:
#include "badlib.h"
int main() {
    struct BadData bad;
    trigger(3, bad);
}

This has to be compiled as follows:
gcc -std=c99 -Wall -fPIC -O2 -g -shared badlib.c -o libbadlib.so
gcc -std=c99 -Wall -O0 -g -L. -lbadlib -o main main.c

The upper part of the generated asm instructions of "trigger":
0xb6fc449c <trigger>    sub     sp, #16
0xb6fc449e <trigger+2>  add     r0, sp, #4
0xb6fc44a0 <trigger+4>  add     sp, #16
0xb6fc44a2 <trigger+6>  stmia.w r0, {r1, r2, r3}       <-- Invalid write
0xb6fc44a6 <trigger+10> ldmia   r0, {r0, r1, r2, r3}   <-- Invalid read
0xb6fc44a8 <trigger+12> b.w    0xb6fc4348 <dummy@plt>

In the first three lines, the address 12 bytes below the stack pointer (SP-12)
is written to register r0. Then, the stmia.w instruction writes the values of
the registers r1, r2 and r3 to the addresses SP-12, SP-8 and SP-4,
respectively. After that, the ldmia instruction loads the values from the
addresses SP-12, SP-8, SP-4 and SP into the registers r1, r2, r3 and r4,
respectively. So basically, the values just switch their registers.  These
registers, however, are part of that "BadData" struct argument for the dummy
function.

[1] In section 5.2.1.1 of https://developer.arm.com/documentation/ihi0042/f/?lang=en
[2] https://developer.arm.com/documentation/ihi0042/g/?lang=en#processes-memory-and-the-stack
[3] The struct member "retcode":
    https://github.com/torvalds/linux/blob/v4.19/arch/arm/kernel/signal.h#L5
[4] The retrieval of the address for that sigframe
    https://github.com/torvalds/linux/blob/v4.19/arch/arm/kernel/signal.c#L485
[5] https://man7.org/linux/man-pages/man2/sigaction.2.html
[6] https://github.com/torvalds/linux/blob/v4.19/arch/arm/kernel/signal.c#L412
[7] https://github.com/torvalds/linux/blob/v4.19/arch/arm/kernel/signal.c#L365
[8] https://github.com/bminor/glibc/blob/glibc-2.28/sysdeps/unix/sysv/linux/arm/sigaction.c#L23
[9] https://github.com/bminor/glibc/blob/glibc-2.28/sysdeps/unix/sysv/linux/sigaction.c#L53


Reply to: