[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: libports and interrupted RPCs



On 07/09/2025 00:20, Samuel Thibault wrote:
You are not really sure what is happening with the rpc_info list while
you don't have the lock. Possibly currently it happens to be safe
because the item you are on while not move within the list, but this
looks very fragile to me. Maybe better record an array of the rpc_info
pointers that we want to cancel.
The changes I suggest do not access the list in this way after the mutex has been released. The next iteration restarts the scan from the (possibly new) head of the list. Admittedly, this will result in a number of passes down the list and how that compares in performance to dynamically allocating memory for an array of variable size isn't clear. Both solutions still require the 'cancelling' state added to rpc_info that prevents the affected RPC thread from terminating.
3) Reset the RPC as no longer in cancellation and repeat from 1) until there
are no more RPCs to be cancelled by this thread.
We may end up in a livelock here, if somehow some other code keeps
making newer RPCs in the thread.
That does not occur because the first pass of the list marks the RPCs that will be cancelled in this call. Any RPCs added (or removed) from the list later will not be considered for cancellation.
I did wonder why all RPCs were being cancelled when the signal is delivered
It's not all RPCs, just the ones on the port that the signaled thread is
waiting an RPC for. That can indeed be a lot if a lot of threads happen
to be waiting on this port. It however looks safer this way: you'd never
really know which kind of interlockign condition there might be in the
server for the various threads blocked on the port. For instance if the
server was serving a shared condition variable, you might want to make
sure that everyone has a chance to wake up, and not only the one that is
getting interrupted and might try to be doing something else.

We just want to avoid a storm of interruptions, and I believe avoiding
to cancel an already being-canceled thread can lead us way further to
that direction.

I didn't explicitly say RPCs 'on that port' but that is what I had in mind. I was aware that it wasn't all RPCs in the system.

I don't understand the suggestion about not re-cancelling a thread already in cancellation due to a signal. That occurs within the originating client but isn't the storm of interruptions being generated on the server side?

Regards,

Mike.


Reply to: