[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#843822: KDE binaries (ksplashqml) *silently* fail if they cannot allocate memory, causing a hang at startup



Package: kde-workspace-bin
Version: 4:4.11.13-2

When attempting to start kde with a too low ulimit -d applied, kde just 
hangs, without displaying its splash screen, nor popping up any error 
dialog, nor printing any error message to stderr (.xsession-errors)

This makes diagnosing such situations needlessly difficult.

strace shows the following sequence:

6907  execve("/usr/bin/ksplashqml", ["ksplashqml", "lines", "--pid"], [/* 59 vars */]) = 0
...
6907  clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f8ada54aa50) = 6908
...
6907  exit_group(0)                     = ?
6907  +++ exited with 0 +++
...
6908  mmap(NULL, 2147483648, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0 <unfinished ...>
6908  <... mmap resumed> )              = -1 ENOMEM (Cannot allocate memory)
6908  --- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=0xbbadbeef} ---
6908  +++ killed by SIGSEGV +++

This shows several problems:
1. No attempt is made by the victim process (ksplashqml) to write an 
error message to stderr
2. Apparently the victim process tries to use the pointer returned by 
mmap without first checking it for MAP_FAILED, which results in a 
segfault.
3. The process is started with a double-fork making it impossible for 
the parent process to react to the SEGV either. Double fork is a weird 
way of starting a child process that is meant to be shortlived (such as 
the splash screen)

This situation may seem cosmetical (it's the splash screen, after 
all...) but unfortunately it occurs similarly for 
/usr/bin/kbuildsycoca4 --incremental --checkstamps and 
/usr/bin/kdeinit4 --oom-pipe +kcminit_startup

Occasionally the following message is indeed printed to stderr:

QThread::start: Thread creation error: Resource temporarily unavailable

This is still misleading, as it doesn't tell *which* resource. Indeed, 
the error from the kernel is ENOMEM, and not EAGAIN.

This makes it needlessly difficult to debug such a situation. A memory 
issue is really not expected on a hang (rather than on a crash...)

Thanks for fixing this.

Alain


Reply to: