[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#459322: Bugzilla Bug 5626 pthread_cleanup_pop_restore_np isn't safe



Here is a diff that makes things work:


"Also sprach ptb:"
> I believe the simplest correction is to 
> 
>    1) change the implementation of pthread_cleanup_pop_restore_np from
> 
>      A  __pthread_unregister_cancel_restore (&__cancel_buf);
>      B  __cancel_routine (__cancel_arg);
> 
>    to
> 
>      B  __cancel_routine (__cancel_arg);
>      A  __pthread_unregister_cancel_restore (&__cancel_buf);
> 
> and
> 
>   2) introduce a counter to prevent the __cancel_routine being run twice
>      for the case that a cancellation message is received after the
>      __cancel_routine is run but before the thread runs
>      __pthread_unregister_cancel_restore and comes out of defrred mode
>      and goes back to async mode.

--- pthread.h.orig	2007-12-19 21:48:54.000000000 +0100
+++ pthread.h	2008-01-18 16:09:58.000000000 +0100
@@ -667,11 +669,12 @@
 /* Install a cleanup handler as pthread_cleanup_push does, but also
    saves the current cancellation type and sets it to deferred
    cancellation.  */
 #  define pthread_cleanup_push_defer_np(routine, arg) \
   do {									      \
     __pthread_unwind_buf_t __cancel_buf;				      \
     void (*__cancel_routine) (void *) = (routine);			      \
     void *__cancel_arg = (arg);						      \
+    int __ran_cancel_routine = 0;					      \
     int not_first_call = __sigsetjmp ((struct __jmp_buf_tag *)		      \
 				      __cancel_buf.__cancel_jmp_buf, 0);      \
     if (__builtin_expect (not_first_call, 0))				      \
@@ -691,9 +695,13 @@
    pthread_cleanup_push_defer was called.  */
 #  define pthread_cleanup_pop_restore_np(execute) \
     } while (0);							      \
+    if (execute) {							      \
+      if (!__ran_cancel_routine++)					      \
+        __cancel_routine (__cancel_arg);				      \
+      else 								      \
+        __ran_cancel_routine--;					      	      \
+    }								      	      \
     __pthread_unregister_cancel_restore (&__cancel_buf);		      \
-    if (execute)							      \
-      __cancel_routine (__cancel_arg);					      \
   } while (0)
 extern void __pthread_unregister_cancel_restore (__pthread_unwind_buf_t *__buf)
   __cleanup_fct_attribute;


The "ran_cancel_routine--" is just to stop the positive count getting
excessive in the wildly improbable case that approx 2^31 cancel messages
are received and partially acted on, since I don't know the internal
semantics of the routines involved.  Ditch it if you are able to.

This fixes the test program from running like this:

% ./testp
(18862) parent created thread OK
(18862) parent detached thread OK
(18862) parent sleeps 2s
(18863) child set async cancellation mode
(18863) child sleeps 1s
(18863) child wakes and starts cycling lock/unlock of mutex
(18862) parent wakes after sleep
(18862) parent starts sending cancels to child
(18862) parent spots child died
!!child died leaving lock set!!
  1 threads created
  1 cancel signals sent
  2436552 lock cycles completed

to running like this:

% ./testp
(19395) parent created thread OK
(19395) parent detached thread OK
(19395) parent sleeps 2s
(19396) child set async cancellation mode
(19396) child sleeps 1s
(19396) child wakes and starts cycling lock/unlock of mutex
(19395) parent wakes after sleep
(19395) parent starts sending cancels to child
(19395) parent spots child died
(19395) parent sees child cleaned up OK, restarting
(19395) parent created thread OK
(19395) parent detached thread OK
(19395) parent sleeps 2s
(19397) child set async cancellation mode
(19397) child sleeps 1s
(19397) child wakes and starts cycling lock/unlock of mutex
(19395) parent wakes after sleep
(19395) parent starts sending cancels to child
(19395) parent spots child died
(19395) parent sees child cleaned up OK, restarting
(19395) parent created thread OK
(19395) parent detached thread OK
(19395) parent sleeps 2s
(19398) child set async cancellation mode
(19398) child sleeps 1s
(19398) child wakes and starts cycling lock/unlock of mutex
(19395) parent wakes after sleep
(19395) parent starts sending cancels to child
(19395) parent spots child died
(19395) parent sees child cleaned up OK, restarting
(19395) parent created thread OK
(19395) parent detached thread OK
(19395) parent sleeps 2s
(19399) child set async cancellation mode
(19399) child sleeps 1s
(19399) child wakes and starts cycling lock/unlock of mutex
(19395) parent wakes after sleep
(19395) parent starts sending cancels to child
(19395) parent spots child died
(19395) parent sees child cleaned up OK, restarting
(19395) parent created thread OK
(19395) parent detached thread OK
(19395) parent sleeps 2s
(19400) child set async cancellation mode
(19400) child sleeps 1s
(19400) child wakes and starts cycling lock/unlock of mutex
(19395) parent wakes after sleep
(19395) parent starts sending cancels to child
(19395) parent spots child died
(19395) parent sees child cleaned up OK, restarting
(19395) parent created thread OK
(19395) parent detached thread OK
(19395) parent sleeps 2s
(19401) child set async cancellation mode
(19401) child sleeps 1s
...

etc.


There are two other similar areas to consider in pthread.h, that I
haven't tackled since my environment doesn't exercise those sections of
the #ifdefs.  I'll do them too, if asked.

Regards

Peter




Reply to: