On 07/09/2025 00:20, Samuel Thibault wrote:
The changes I suggest do not access the list in this way after the mutex has been released. The next iteration restarts the scan from the (possibly new) head of the list. Admittedly, this will result in a number of passes down the list and how that compares in performance to dynamically allocating memory for an array of variable size isn't clear. Both solutions still require the 'cancelling' state added to rpc_info that prevents the affected RPC thread from terminating.You are not really sure what is happening with the rpc_info list while you don't have the lock. Possibly currently it happens to be safe because the item you are on while not move within the list, but this looks very fragile to me. Maybe better record an array of the rpc_info pointers that we want to cancel.
That does not occur because the first pass of the list marks the RPCs that will be cancelled in this call. Any RPCs added (or removed) from the list later will not be considered for cancellation.3) Reset the RPC as no longer in cancellation and repeat from 1) until there are no more RPCs to be cancelled by this thread.We may end up in a livelock here, if somehow some other code keeps making newer RPCs in the thread.
I did wonder why all RPCs were being cancelled when the signal is deliveredIt's not all RPCs, just the ones on the port that the signaled thread is waiting an RPC for. That can indeed be a lot if a lot of threads happen to be waiting on this port. It however looks safer this way: you'd never really know which kind of interlockign condition there might be in the server for the various threads blocked on the port. For instance if the server was serving a shared condition variable, you might want to make sure that everyone has a chance to wake up, and not only the one that is getting interrupted and might try to be doing something else. We just want to avoid a storm of interruptions, and I believe avoiding to cancel an already being-canceled thread can lead us way further to that direction.
I didn't explicitly say RPCs 'on that port' but that is what I had in mind. I was aware that it wasn't all RPCs in the system.
I don't understand the suggestion about not re-cancelling a thread already in cancellation due to a signal. That occurs within the originating client but isn't the storm of interruptions being generated on the server side?
Regards, Mike.