[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

mmap broken - glibc or kernel to blame?



Hello,

[Please CC the replies to me, as I am not subscribed to this list]

While investigating recent FTBFS bug reports [0,1] I have come to a
conclusion, that something is wrong with either the dynamic linker
ld-linux.so.2, or the way kernel handles certain mmap() calls (at
least on sparc64, possibly on i386 as well). Below is the illustration
of debugging the problem on sparc64 machine (up-to-date unstable
chroot, kernel 2.6.8-1-sparc64, libc6 2.3.2.ds1-18) on a test example

char a[134084860];
int main() { return 0; }

compiled into an a.out executable. Running 'ld-linux.so.2 ./a.out'
under gdb and looking in /proc/<pid>/maps I see (irrelevant paths and
whites pace removed for brevity):

08000000-0801a000 r-xp 00000000 08:11 319415 ld-2.3.2.so
08028000-0802a000 rwxp 00018000 08:11 319415 ld-2.3.2.so
efffe000-f0000000 rw-p efffe000 00:00 0

So, the executable is mapped starting at 0x8000000. I then continue
execution, catching the SIGILL. After that /proc/<pid>/maps looks like
that:

00010000-00012000 r-xp 00000000 08:11 458670 a.out
00020000-00024000 rwxp 00000000 08:11 458670 a.out
00024000-08002000 rwxp 00024000 00:00 0 08002000-0801a000 r-xp 00002000 08:11 319415 ld-2.3.2.so
08028000-0802a000 rwxp 00018000 08:11 319415 ld-2.3.2.so
efffe000-f0000000 rw-p efffe000 00:00 0

As you can see, as a result of mmapping of ./a.out to memory, the
section (containing executable code!) 08000000-08002000 has been
overwritten (with zeroes), producing a SIGILL. This picture correlates
nicely with the result of running it under strace:

execve("/usr/lib/debug/ld-linux.so.2", ["/usr/lib/debug/ld-linux.so.2", "./a.out"], [/* 16 vars */]) = 0
uname({sys="Linux", node="kundera", ...}) = 0
brk(0)                                  = 0x802a000
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
open("./a.out", O_RDONLY)               = 3
read(3, "\177ELF\1\2\1\0\0\0\0\0\0\0\0\0\0\2\0\2\0\0\0\1\0\1\3P"..., 512) = 512
fstat64(3, {st_mode=S_IFREG|0755, st_size=19133, ...}) = 0
getcwd("/root", 128)                    = 6
mmap(0x10000, 8192, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED, 3, 0) = 0x10000
mmap(0x20000, 16384, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_FIXED, 3, 0) = 0x20000
mmap(0x24000, 134077000, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x24000
close(3)                                = 0
open("/etc/ld.so.preload", O_RDONLY)    = -1 ENOENT (No such file or directory)
--- SIGILL (Illegal instruction) @ 0 (0) ---
+++ killed by SIGILL +++

So, a million dollar question is: whose fault is it? I see two
possibilities: either ld-linux.so.2 is supposed to make sure that
there is enough memory available for the mmapping but fails to do it
for some reason; or this check is supposed to be performed by kernel
and the mmap call above should not succeed. One point of view,
presented by Richard Mortimer in [2] and based on POSIX specification
of mmap leads to a conclusion that kernel is not at fault here, it
just follows the POSIX-defined behaviour. I am clearly not an expert
on the issue, so any information you can provide will be greatly
appreciated. The offending mmap call, as far as I can tell, comes from
the line 1146 in elf/dl-load.c:

mapat = __mmap ((caddr_t) zeropage, zeroend - zeropage,
                c->prot, MAP_ANON|MAP_PRIVATE|MAP_FIXED,
                ANONFD, 0);

[0] http://bugs.debian.org/268450
[1] http://lists.debian.org/debian-sparc/2004/12/msg00009.html
[2] http://marc.theaimsgroup.com/?l=linux-sparc&m=110220197504985&w=2

Best regards,

Jurij Smakov                                        jurij@wooyd.org
Key: http://www.wooyd.org/pgpkey/                   KeyID: C99E03CC



Reply to: