[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: libports and interrupted RPCs



Michael Kelly, le lun. 01 sept. 2025 23:07:44 +0100, a ecrit:
> On 31/08/2025 22:47, Samuel Thibault wrote:
> > Michael Kelly, le sam. 30 août 2025 21:29:46 +0100, a ecrit:
> > > This sequence of hurd_thread_cancel() calls all occur whilst a single
> > > process wide mutex is held locked (see libports:interrupt_rpcs.c).
> > You mean the _ports_lock mutex?
> Yes.

Note that this mutex protects the current_rpcs list. Going through the
list without the mutex would be unsafe. Another way would be to record
the thread ports in a local array, and call hurd_thread_cancel() in a
loop after releasing the mutex. But then the threads might still die
in-between. We could add a reference to keep the port allocated, but
hurd_thread_cancel would then allocate again a signal state for dead
threads...

> > > The same lock is also required to begin or end other RPCs on other
> > > ports and so they must wait until the initial interrupt_operation
> > > completes.
> > ? I don't think ports_interrupt_rpcs actually waits for something to
> > finish? hurd_thread_cancel() should be asynchronous, and
> > _ports_record_interruption clearly is.
> 
> I hadn't any evidence to present so today I reran the stress test without my
> code changes to ports_interrupt_rpcs(). I have attached a reduced version of
> the very long set of stack traces from the ext2fs server. I have the
> complete list of all threads saved but it's a bit long for this message. In
> summary the traces show:
> 
> 1) One thread (thread: 35) handling an interrupt_operation request. This
> shows it making a secondary interrupt_operation RPC to a storeio task. The
> port in use has a msgcount of 5 preventing immediate delivery of this
> message.

Ah! interrupt_operation calls pile up... So indeed ports_interrupt_rpcs
takes some time.

But this is actually useless, one is enough for a given thread.

I wonder if in glibc's hurd_thread_cancel, we could just add an

if (!ss->cancel)

condition on the lines from ss->cancel = 1; to calling the cancel hook.
That way, if we try to cancel the same thread several times, we'll just
suspend/resume it several times, and not call interrupt_operation on the
server several times.

Samuel


Reply to: