Re: libports and interrupted RPCs
Michael Kelly, le lun. 01 sept. 2025 23:07:44 +0100, a ecrit:
> On 31/08/2025 22:47, Samuel Thibault wrote:
> > Michael Kelly, le sam. 30 août 2025 21:29:46 +0100, a ecrit:
> > > This sequence of hurd_thread_cancel() calls all occur whilst a single
> > > process wide mutex is held locked (see libports:interrupt_rpcs.c).
> > You mean the _ports_lock mutex?
> Yes.
Note that this mutex protects the current_rpcs list. Going through the
list without the mutex would be unsafe. Another way would be to record
the thread ports in a local array, and call hurd_thread_cancel() in a
loop after releasing the mutex. But then the threads might still die
in-between. We could add a reference to keep the port allocated, but
hurd_thread_cancel would then allocate again a signal state for dead
threads...
> > > The same lock is also required to begin or end other RPCs on other
> > > ports and so they must wait until the initial interrupt_operation
> > > completes.
> > ? I don't think ports_interrupt_rpcs actually waits for something to
> > finish? hurd_thread_cancel() should be asynchronous, and
> > _ports_record_interruption clearly is.
>
> I hadn't any evidence to present so today I reran the stress test without my
> code changes to ports_interrupt_rpcs(). I have attached a reduced version of
> the very long set of stack traces from the ext2fs server. I have the
> complete list of all threads saved but it's a bit long for this message. In
> summary the traces show:
>
> 1) One thread (thread: 35) handling an interrupt_operation request. This
> shows it making a secondary interrupt_operation RPC to a storeio task. The
> port in use has a msgcount of 5 preventing immediate delivery of this
> message.
Ah! interrupt_operation calls pile up... So indeed ports_interrupt_rpcs
takes some time.
But this is actually useless, one is enough for a given thread.
I wonder if in glibc's hurd_thread_cancel, we could just add an
if (!ss->cancel)
condition on the lines from ss->cancel = 1; to calling the cancel hook.
That way, if we try to cancel the same thread several times, we'll just
suspend/resume it several times, and not call interrupt_operation on the
server several times.
Samuel
Reply to: