[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#460512: marked as done (libc6-dev: pthread_cancel causes sigsegv in receiving thread)



Your message dated Sun, 13 Jan 2008 12:22:50 +0100
with message-id <4789F48A.5000407@aurel32.net>
and subject line Bug#460512: libc6-dev: pthread_cancel causes sigsegv in receiving thread
has caused the attached Bug report to be marked as done.

This means that you claim that the problem has been dealt with.
If this is not the case it is now your responsibility to reopen the
Bug report if necessary, and/or fix the problem forthwith.

(NB: If you are a system administrator and have no idea what I am
talking about this indicates a serious mail system misconfiguration
somewhere.  Please contact me immediately.)

Debian bug tracking system administrator
(administrator, Debian Bugs database)

--- Begin Message ---
Package: libc6-dev
Version: 2.7-4
Severity: normal

The following program exits with SEGV after sending a cancel signal to a
single thread which is sitting in a nanosleep for 10s, with no mutexes held
anywhere. The thread gets the segv, as far as I can tell in the handler,
not the parent.

The program is compiled with -pthread.

One can detatch the thread or not - it makes no difference. One can put
the thread in deferred or asynchronous mode - it makes no difference.

Typical output is

   ./testp2
   (21019) created thread 21020
   signal 11 received in pid 21020 after 1 cancel signals
   Segmentation fault

The thread is just sitting in a nanosleep, as I said.

It makes no difference if a handler for SEGV is set or not - it just
prints the output message.

CAVEAT: for all I know this is normal behaviour. Maybe one has to send a
cancel message from a sibling thread, not a parent thread. Maybe
"thread" has some special meaning in posix such as "whatever it is that
may send a cancel message without causing a segv in the receiver",
thus making the sender of a cancel message that does so inappropriate.
Shrug.


A strace -f shows that it is the child that gets the segv, while it's in
the nanosleep.

  ...
  mprotect(0xb7f68000, 4096, PROT_READ)   = 0
  munmap(0xb7f86000, 102350)              = 0
  set_tid_address(0xb7e206f8)             = 21058
  sendto(-1209923840, umovestr: Input/output error
  0xc, 3086491636,
  MSG_PROXY|MSG_EOR|MSG_TRUNC|MSG_FIN|MSG_SYN|0xb7e20000, NULL,
  3215699728) = -1 ENOSYS (Function not implemented)
  futex(0xbfabaf00, 0x81 /* FUTEX_??? */, 1) = -1 ENOSYS (Function not
  implemented)
  rt_sigaction(SIGRTMIN, {0xb7f72260, [], SA_SIGINFO}, NULL, 8) = 0
  rt_sigaction(SIGRT_1, {0xb7f722e0, [], SA_RESTART|SA_SIGINFO}, NULL,
  8) = 0
  rt_sigprocmask(SIG_UNBLOCK, [RTMIN RT_1], NULL, 8) = 0
  getrlimit(RLIMIT_STACK, {rlim_cur=8192*1024, rlim_max=RLIM_INFINITY})
  = 0
  uname({sys="Linux", node="betty.it.uc3m.es", ...}) = 0
  rt_sigaction(SIGSEGV, {0x8048720, [SEGV], SA_RESTART}, {SIG_DFL}, 8) =
  0
  gettid()                                = 21058
  mmap2(NULL, 8388608, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS,
  -1, 0) = 0xb7620000
  brk(0)                                  = 0x804a000
  brk(0x806b000)                          = 0x806b000
  mprotect(0xb7620000, 4096, PROT_NONE)   = 0
  clone(Process 21059 attached
  child_stack=0xb7e1f4c4,
  flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID,
  parent_tidptr=0xb7e1fbd8, {entry_number:6, base_addr:0xb7e1fb90,
  limit:1048575, seg_32bit:1, contents:0, read_exec_only:0,
  limit_in_pages:1, seg_not_present:0, useable:1},
  child_tidptr=0xb7e1fbd8) = 21059
  [pid 21058] futex(0x8049d44, FUTEX_WAIT, 1, NULL <unfinished ...>
  [pid 21059] gettid()                    = 21059
  [pid 21059] write(2, "(21058) created thread 21059\n", 29(21058)
  created thread 21059
  ) = 29
  [pid 21059] futex(0x8049d44, 0x5 /* FUTEX_??? */, 1 <unfinished ...>
  [pid 21058] <... futex resumed> )       = 0
  [pid 21058] futex(0x8049cfc, FUTEX_WAIT, 2, NULL <unfinished ...>
  [pid 21059] <... futex resumed> )       = 1
  [pid 21059] futex(0x8049cfc, FUTEX_WAKE, 1 <unfinished ...>
  [pid 21058] <... futex resumed> )       = 0
  [pid 21058] futex(0x8049cfc, FUTEX_WAKE, 1) = 0
  [pid 21058] nanosleep({0, 1000000},  <unfinished ...>
  [pid 21059] <... futex resumed> )       = 1
  [pid 21059] nanosleep({10, 0},  <unfinished ...>
  [pid 21058] <... nanosleep resumed> {3215699704, 134514076}) = 0
  [pid 21058] open("/etc/ld.so.cache", O_RDONLY) = 3
  [pid 21058] fstat64(3, {st_mode=S_IFREG|0644, st_size=102350, ...}) =
  0
  [pid 21058] mmap2(NULL, 102350, PROT_READ, MAP_PRIVATE, 3, 0) =
  0xb7f86000
  [pid 21058] close(3)                    = 0
  [pid 21058] access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such
  file or directory)
  [pid 21058] open("/lib/libgcc_s.so.1", O_RDONLY) = 3
  [pid 21058] read(3,
  "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0p\31\0\000"..., 512) =
  512
  [pid 21058] fstat64(3, {st_mode=S_IFREG|0644, st_size=41876, ...}) = 0
  [pid 21058] mmap2(NULL, 44964, PROT_READ|PROT_EXEC,
  MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0xb7615000
  [pid 21058] mmap2(0xb761f000, 4096, PROT_READ|PROT_WRITE,
  MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x9) = 0xb761f000
  [pid 21058] close(3)                    = 0
  [pid 21058] munmap(0xb7f86000, 102350)  = 0
  [pid 21058] tgkill(21058, 21059, SIGRTMIN) = 0
  [pid 21058] nanosleep({0, 1000000},  <unfinished ...>
  [pid 21059] <... nanosleep resumed> 0xb7e1f3c0) = ?
  ERESTART_RESTARTBLOCK (To be restarted)
  [pid 21059] --- SIGRTMIN (Unknown signal 32) @ 0 (0) ---
  [pid 21059] futex(0xb761fe84, FUTEX_WAKE, 2147483647) = 0
  [pid 21059] --- SIGSEGV (Segmentation fault) @ 0 (0) ---
  [pid 21059] gettid()                    = 21059
  [pid 21059] write(2, "signal 11 received in pid 21059 "..., 55signal
  11 received in pid 21059 after 1 cancel signals
  ) = 55
  [pid 21059] rt_sigaction(SIGSEGV, {SIG_IGN}, {0x8048720, [SEGV],
  SA_RESTART}, 8) = 0
  [pid 21059] sigreturn()                 = ? (mask now [RTMIN])
  [pid 21058] <... nanosleep resumed> {3215699704, 134514076}) = 0
  [pid 21058] nanosleep({0, 1000000},  <unfinished ...>
  [pid 21059] --- SIGSEGV (Segmentation fault) @ 0 (0) ---
  Process 21059 detached
  <... nanosleep resumed> 0xbfabaee8)     = ? ERESTART_RESTARTBLOCK (To
  be restarted)
  +++ killed by SIGSEGV +++
  Process 21058 detached
  betty:/home/oboe/ptb%




Here's the program.



// compile line:  gcc -O2 -pthread -g -o testp2 testp2.c

#include <string.h>
#include <stdio.h>
#include <stdlib.h>
#include <error.h>
#include <errno.h>
#include <time.h>
#include <signal.h>
#include <sys/types.h>
#define _GNU_SOURCE             /* or _BSD_SOURCE or _SVID_SOURCE */
#include <unistd.h>
#include <sys/syscall.h>

#define __USE_GNU 1
#include <pthread.h>

static pthread_mutex_t mutex = PTHREAD_RECURSIVE_MUTEX_INITIALIZER_NP;
static pthread_cond_t cond = PTHREAD_COND_INITIALIZER;

static int kcount;
static int debug;

static t;                       // set when thread is ready to play

static void
print_error (int err, char *fnname, int line)
{
    char buf[128];

    fprintf (stderr, "error '%s' (%d) from %s line %d\n",
             strerror_r (err, buf, sizeof (buf)), err, fnname, line);
}

static void
sighandler (int k)
{
    fprintf (stderr,
             "signal %d received in pid %d after %d cancel signals\n", k,
             syscall (SYS_gettid), kcount);
    debug++;
    signal (k, SIG_IGN);
}

#define DEBUG(s ...) if (debug) { \
     fprintf(stderr, "(%d) ", syscall(SYS_gettid)); \
     fprintf(stderr, s); \
  }

static void
do_thread_stuff (void)
{

    // just wait for 10s in nanosleep
    struct timespec req = { 10, 0, }, rem;
    int err;

  complete_nanosleep:
    err = nanosleep (&req, &rem);
    if (err) {
        switch (errno) {
          case EFAULT:
              print_error (errno, "nanosleep", __LINE__);
              break;
          case EINTR:
              req = rem;
              goto complete_nanosleep;
              break;
          case EINVAL:
              print_error (errno, "nanosleep", __LINE__);
              break;
        }
    }
}

static void *
tfn (void *tdata)
{

    int err;

    // can try ASYNC mode here - no difference
    //err = pthread_setcanceltype(PTHREAD_CANCEL_ASYNCHRONOUS, NULL);
    //if (err) {
    //    print_error(err, "pthread_setcanceltype", __LINE__);
    //    return NULL;
    //}

    /* 
     *   - tell the parent waiter that we are ready to play
     */

    pthread_mutex_lock (&mutex);
    fprintf (stderr, "(%d) created thread %d\n", (pthread_t)tdata, syscall (SYS_gettid));
    t++;
    // signal parent that thread is up and working
    pthread_cond_signal (&cond);

    pthread_mutex_unlock (&mutex);

    do_thread_stuff ();         // just wait in nanosleep

}

int
main ()
{

    pthread_t tid;
    int err;

    signal (SIGSEGV, sighandler);

    pthread_mutex_lock (&mutex);

    err = pthread_create (&tid, NULL, tfn, (void *)(long)(syscall(SYS_gettid)));
    if (err) {
        print_error (err, "pthread_create", __LINE__);
        return -err;
    }

    while (t <= 0) {
        pthread_cond_wait (&cond, &mutex);
    }

    // detach or not makes no difference
    //err = pthread_detach (tid);
    //if (err) {
    //    print_error (err, "pthread_detach", __LINE__);
    //    return -err;
    //}

    /* the thread is now ready and we have the lock on the
     * mutex. No other lock is held or will ever be held. 
     * Release the lock and go play with cancel messages.
     */

    pthread_mutex_unlock (&mutex);

    while (1) {

        // cancel the thread every millisecond
        struct timespec req = { 0, 1000000, }, rem;

      complete_nanosleep:
        err = nanosleep (&req, &rem);
        if (err) {
            switch (errno) {
              case EFAULT:
                  print_error (errno, "nanosleep", __LINE__);
                  break;
              case EINTR:
                  req = rem;
                  goto complete_nanosleep;
                  break;
              case EINVAL:
                  print_error (errno, "nanosleep", __LINE__);
                  break;
            }
        }

        err = pthread_cancel (tid);

        if (err) {
            print_error (err, "pthread_cancel", __LINE__);
        }
        kcount++;

    }
    // never reached
}



-- System Information:
Debian Release: lenny/sid
  APT prefers testing
  APT policy: (500, 'testing')
Architecture: i386 (i686)

Kernel: Linux 2.6.15.3 (PREEMPT)
Locale: LANG=C, LC_CTYPE=C (charmap=ANSI_X3.4-1968) (ignored: LC_ALL set to C)
Shell: /bin/sh linked to /bin/bash

Versions of packages libc6-dev depends on:
ii  libc6                         2.7-4      GNU C Library: Shared libraries
ii  linux-libc-dev                2.6.22-6   Linux Kernel Headers for developme

Versions of packages libc6-dev recommends:
ii  gcc [c-compiler]    4:4.2.1-6            The GNU C compiler
ii  gcc-2.95 [c-compile 1:2.95.4-27          The GNU C compiler
ii  gcc-3.4 [c-compiler 3.4.6-6              The GNU C compiler
ii  gcc-4.1 [c-compiler 4.1.2-18             The GNU C compiler
ii  gcc-4.2 [c-compiler 4.2.2-4              The GNU C compiler
ii  tcc [c-compiler]    0.9.24~cvs20070502-2 the smallest ANSI C compiler

-- no debconf information



--- End Message ---
--- Begin Message ---
Peter T. Breuer a écrit :
> Package: libc6-dev
> Version: 2.7-4
> Severity: normal
> 
> The following program exits with SEGV after sending a cancel signal to a
> single thread which is sitting in a nanosleep for 10s, with no mutexes held
> anywhere. The thread gets the segv, as far as I can tell in the handler,
> not the parent.
> 
> The program is compiled with -pthread.
> 
> One can detatch the thread or not - it makes no difference. One can put
> the thread in deferred or asynchronous mode - it makes no difference.
> 
> Typical output is
> 
>    ./testp2
>    (21019) created thread 21020
>    signal 11 received in pid 21020 after 1 cancel signals
>    Segmentation fault
> 
> The thread is just sitting in a nanosleep, as I said.
> 
> It makes no difference if a handler for SEGV is set or not - it just
> prints the output message.

Compiling your program with -Wall shows too much warnings:

test.c:24: warning: type defaults to ‘int’ in declaration of ‘t’
test.c: In function ‘print_error’:
test.c:32: warning: format ‘%s’ expects type ‘char *’, but argument 3
has type ‘int’
test.c: In function ‘sighandler’:
test.c:40: warning: format ‘%d’ expects type ‘int’, but argument 4 has
type ‘long int’
test.c: In function ‘tfn’:
test.c:94: warning: format ‘%d’ expects type ‘int’, but argument 3 has
type ‘long unsigned int’
test.c:94: warning: format ‘%d’ expects type ‘int’, but argument 4 has
type ‘long int’
test.c:80: warning: unused variable ‘err’
test.c:103: warning: control reaches end of non-void function

Most notably strerror_r returns 0 when it succeeds and -1 when it fails,
so printing it with %s may lead to segfault.

Please fix those warnings, and if the bug is still present, come with a
warning-clean testcase.

-- 
  .''`.  Aurelien Jarno	            | GPG: 1024D/F1BCDB73
 : :' :  Debian developer           | Electrical Engineer
 `. `'   aurel32@debian.org         | aurelien@aurel32.net
   `-    people.debian.org/~aurel32 | www.aurel32.net


--- End Message ---

Reply to: