[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#459322: libc6-dev: implementation of pthread_cleanup_pop_restore_np



Here is a simple testcase code which shows up bug

    libc6-dev: implementation of pthread_cleanup_pop_restore_np ..

    (is all wrong)


Rationale:
----------
The bug is that pthread_cleanup_pop_restore_np returns from deferred
mode to async mode and runs the cleanup function in the wrong order. It does

     1) pop deferred cancel mode back to async mode, pop cancel cleanup fn
     2) run cleanup fn.

It should do it the other way round. Otherwise after (1) it is in async
mode and will get killed immediately by a cancel, without running the
cleanup fn (2).

Thus if the cancelled thread has set a mutex and the cleanup fn is
supposed to release the mutex on cancel, the mutex will remain set,
usually blocking all other threads forever.

This is directly in contradiction to the avowed purpose of
pthread_cleanup_pop_restore_np, from the man page:

   pthread_cleanup_push_defer_np is a non-portable extension that
   combines pthread_cleanup_push and pthread_setcanceltype(3).  It
   pushes a cleanup handler just as pthread_cleanup_push does, but also
   saves the current cancellation type and sets it to deferred
   cancellation.  This ensures that the cleanup mechanism is effective
   even if the thread was initially in asynchronous cancellation mode.
   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Testcase summary:
----------------
The testcase code simply does

   1) launch a child thread

           the child switches to async cancellation mode
           the child sleeps 1s
           the child starts cycling forever taking a mutex and releasing it
           again as fast as it can, using a pthread_cleanup_push_defer_np
           and pthread_cleanup_pop_restore_np pair to protect the
           lock/unlock.

   2) sleep 2s second
   3) start sending cancel messages to the child in a loop every 1ms.
        
          if the cancel message fails with ESRCH, meåning the child has
          died, examine the state of the mutex with trylock to see if
          the child left it locked or unlocked.

          locked means the child failed to run the cleanup function,
          and the testcase is successful and showing it up.

          unlocked means the child got lucky before it died, and we go
          back to (1)

Testcase data:
--------------

Here is a run of the testcase code:

  betty:/home/oboe/ptb% ./testp
  (13284) parent created thread OK
  (13284) parent detached thread OK
  (13284) parent sleeps 2s
  (13285) child set async cancellation mode
  (13285) child sleeps 1s
  (13285) child wakes and starts cycling lock/unlock of mutex
  (13284) parent wakes after sleep
  (13284) parent starts sending cancels to child
  (13284) parent spots child died
  !!child died leaving lock set!!
    1 threads created
    1 cancel signals sent
    810297 lock cycles completed
  betty:/home/oboe/ptb%

and another

  betty:/home/oboe/ptb% ./testp
  (13286) parent created thread OK
  (13286) parent detached thread OK
  (13286) parent sleeps 2s
  (13287) child set async cancellation mode
  (13287) child sleeps 1s
  (13287) child wakes and starts cycling lock/unlock of mutex
  (13286) parent wakes after sleep
  (13286) parent starts sending cancels to child
  (13286) parent spots child died
  (13286) parent sees child cleaned up OK, restarting
  (13286) parent created thread OK
  (13286) parent detached thread OK
  (13286) parent sleeps 2s
  (13288) child set async cancellation mode
  (13288) child sleeps 1s
  (13288) child wakes and starts cycling lock/unlock of mutex
  (13286) parent wakes after sleep
  (13286) parent starts sending cancels to child
  (13286) parent spots child died
  (13286) parent sees child cleaned up OK, restarting
  (13286) parent created thread OK
  (13286) parent detached thread OK
  (13286) parent sleeps 2s
  (13289) child set async cancellation mode
  (13289) child sleeps 1s
  (13289) child wakes and starts cycling lock/unlock of mutex
  (13286) parent wakes after sleep
  (13286) parent starts sending cancels to child
  (13286) parent spots child died
  !!child died leaving lock set!!
    3 threads created
    3 cancel signals sent
    2649682 lock cycles completed
  betty:/home/oboe/ptb%

Testcase compile:
-----------------

Here is the compilation line:

   betty:/home/oboe/ptb% gcc -g -O1 -pthread -Wall -o testp testp.c
   betty:/home/oboe/ptb%

(-O value makes no difference).


Testcase code:
--------------

And here is the code:



#include <string.h>
#include <stdio.h>
#include <stdlib.h>
#include <error.h>
#include <errno.h>
#include <time.h>
#include <signal.h>
#include <sys/types.h>
#define _GNU_SOURCE             /* or _BSD_SOURCE or _SVID_SOURCE */
#include <unistd.h>
#include <sys/syscall.h>

#define __USE_GNU 1
#include <pthread.h>

static pthread_mutex_t mutex;

static int kcount;
static int tcount;
static int mcount;
static int debug = 1;

static void
print_error (int err, char *fnname, int line)
{
    error_at_line (0, err, __FILE__, line, "err %d in function %s", err, fnname);
}

#define DEBUG(s ...) if (debug) { \
     fprintf(stderr, "(%ld) ", syscall(SYS_gettid)); \
     fprintf(stderr, s); \
  }

static void
cleanupfn (void *data)
{

    int err = pthread_mutex_unlock ((pthread_mutex_t *) data);

    if (err)
        print_error (err, "pthread_mutex_unlock", __LINE__);
}

static void *
tfn (void *tdata)
{

    int err = pthread_setcanceltype (PTHREAD_CANCEL_ASYNCHRONOUS, NULL);

    if (err) {
        print_error (err, "pthread_setcanceltype", __LINE__);
        return NULL;
    }
    DEBUG ("child set async cancellation mode\n");

    /* now with cancellation enabled and async */

    DEBUG ("child sleeps 1s\n");
    sleep (1);
    DEBUG ("child wakes and starts cycling lock/unlock of mutex\n");

    while (1) {

        pthread_cleanup_push_defer_np (cleanupfn, &mutex);

        // now back in deferred cancellation mode
        err = pthread_mutex_lock (&mutex);
        if (err)
            print_error (err, "pthread_mutex_lock", __LINE__);

        mcount++;

        // here we go back to async mode and THEN unlock, too late
        pthread_cleanup_pop_restore_np (err == 0);
    }
}

int
main ()
{

    pthread_t tid;
    int err;

  beginning:

    mutex = (pthread_mutex_t) PTHREAD_RECURSIVE_MUTEX_INITIALIZER_NP;

    err = pthread_create (&tid, NULL, tfn, NULL);
    if (err) {
        print_error (err, "pthread_create", __LINE__);
        return err;
    }
    tcount++;
    DEBUG ("parent created thread OK\n");

    err = pthread_detach (tid);
    if (err) {
        print_error (err, "pthread_detach", __LINE__);
        return err;
    }
    DEBUG ("parent detached thread OK\n");

    DEBUG ("parent sleeps 2s\n");
    sleep (2);
    DEBUG ("parent wakes after sleep\n");

    DEBUG ("parent starts sending cancels to child\n");
    for (;; kcount++) {

        struct timespec req = { 0, 1000000, }, rem;	// 1ms

        nanosleep (&req, &rem);

        err = pthread_cancel (tid);
        switch (err) {

          case 0:
              // the thread is still alive
              break;

          case ESRCH:

              // the thread died .. test the lock to see if it's set or not
              // Set means the thread died without unlocking it.

              DEBUG ("parent spots child died\n");
              err = pthread_mutex_trylock (&mutex);
              switch (err) {

                case EBUSY:    // mutex is locked + thread is dead = bad!
                    fprintf (stderr, "!!child died leaving lock set!!\n");
                    fprintf (stderr, "  %d threads created\n", tcount);
                    fprintf (stderr, "  %d cancel signals sent\n", kcount);
                    fprintf (stderr, "  %d lock cycles completed\n", mcount);
                    return 0;

                case 0:
                    // we got the lock OK, thread dead, try again
                    err = pthread_mutex_unlock (&mutex);
                    if (err)
                        print_error (err, "pthread_mutex_unlock", __LINE__);
                    DEBUG ("parent sees child cleaned up OK, restarting\n");
                    goto beginning;

                default:
                    // something else invalid about the trylock attempt
                    print_error (err, "pthread_mutex_trylock", __LINE__);
                    return err;
              }
              break;

          default:
              print_error (err, "pthread_cancel", __LINE__);
              return err;
        }
    }
    // never reached
}


I don't think you need a trace. Tell me if you do and I'll send it
tomorrow.

Regards

Peter




Reply to: