[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: libports and interrupted RPCs



Hi Samuel,

On 16/09/2025 00:00, Samuel Thibault wrote:
Michael Kelly, le ven. 12 sept. 2025 19:27:00 +0100, a ecrit:
On 08/09/2025 21:58, Samuel Thibault wrote:
Michael Kelly, le lun. 08 sept. 2025 07:05:39 +0100, a ecrit:

I think what is needed is a feature to guarantee the current sequential
behaviour per port_info but which also permits genuine concurrency across
different port_info. That isn't really true at the moment because of the
usage of the global _ports_lock. This could possibly be a port_info specific
mutex or perhaps an extension of the 'blocking' flags. I'm investigating
those options currently but it seems difficult to ensure locking order to
prevent deadlock.
I don't think we need a strict sequential guarantee.

Why not recording the port_info* pointers in a local array? This
requires extending their life in case of a concurrent termination of
the RPC, but that can be done by adding a reference counter and some
port_info_ref/_unref helpers that maintain it, and ports_end_rpc can
wait for the counter to lower to zero.

Well, if you think this approach is possibly valid after all, please consider the method that I've attached. There is a diff showing minor changes to 4 source files but a complete inclusion of interrupt-rpcs.c since that is quite small and the solution is easier to comprehend when read directly rather than via a diff.

In essence the 2 solutions follow the same method of capturing the RPCs to cancel then cancelling each in turn as a 2nd stage. My solution adds cl_prev and cl_next list item pointers to each rpc_info so that the ones to cancel can be linked together to form a separate cancellation list. This acts the part of the array you refer to. This improves my previous solution by having a maximum number of iterations only double the initial RPC list length.

The advantages of doing this are:

a) There is no need to call malloc() or similar to allocate memory for the array of indeterminate size.

b) Any RPC that is in the cancellation list can actually run to completion if it can do so before the call to hurd_thread_cancel() is made. The benefit is that the thread associated with that RPC can then start processing a new RPC rather than blocking until the ports_interrupt_rpcs() completes.

Note that there is no reference to the head of cancellation list stored in any data structure. It is only maintained locally within the call to ports_interrupt_rpcs(). Note also that it is possible for ports_interrupt_rpcs() to be called concurrently from another thread but only one of these will capture the RPCs to cancel and the other would return immediately with nothing to do. This is the issue I was concerned about but it might be acceptable to expect synchronisation to be managed by the callers instead.

The other aspect you solve with the reference counting is managed by the 'cancelling' flag associated with the rpc_info.

I'm sorry that perhaps I'm over complicating the solution but I was keen to permit maximum concurrency especially as it seems to be causing significant system issues under stress. I appreciate that you are very busy and that this issue is perhaps not at the top of your job list so considering this thread 'low priority' is fine with me.

All the best,

Mike.

diff -ur hurd.orig/libports/begin-rpc.c hurd/libports/begin-rpc.c
--- hurd.orig/libports/begin-rpc.c	2025-07-31 20:36:05.942918581 +0100
+++ hurd/libports/begin-rpc.c	2025-09-10 17:00:11.249190752 +0100
@@ -91,6 +91,8 @@
   
   /* Record that that an RPC is in progress */
   info->thread = hurd_thread_self ();
+  info->cl_next = info->cl_prev = NULL;
+  info->cancelling = FALSE;
   info->next = pi->current_rpcs;
   info->notifies = 0;
   if (pi->current_rpcs)
diff -ur hurd.orig/libports/end-rpc.c hurd/libports/end-rpc.c
--- hurd.orig/libports/end-rpc.c	2025-07-31 20:36:05.942918581 +0100
+++ hurd/libports/end-rpc.c	2025-09-09 05:07:03.903620614 +0100
@@ -27,6 +27,17 @@
 
   pthread_mutex_lock (&_ports_lock);
 
+  while (info->cancelling)
+    {
+      pthread_cond_wait(&_ports_block, &_ports_lock);
+    }
+
+  if (info->cl_next != NULL)
+    info->cl_next->cl_prev = info->cl_prev;
+
+  if (info->cl_prev != NULL)
+    info->cl_prev->cl_next = info->cl_next;
+  
   if (info->notifies)
     _ports_remove_notified_rpc (info);
 
diff -ur hurd.orig/libports/ports.h hurd/libports/ports.h
--- hurd.orig/libports/ports.h	2025-07-31 20:36:05.946918537 +0100
+++ hurd/libports/ports.h	2025-09-09 05:06:25.719897838 +0100
@@ -22,6 +22,7 @@
 #define _HURD_PORTS_
 
 #include <mach.h>
+#include <stdbool.h>
 #include <stdlib.h>
 #include <hurd.h>
 #include <hurd/ihash.h>
@@ -112,6 +113,8 @@
   struct rpc_info *next, **prevp;
   struct rpc_notify *notifies;
   struct rpc_info *interrupted_next;
+  struct rpc_info* cl_next, *cl_prev;
+  bool cancelling;
 };
 
 /* An rpc has requested interruption on a port notification.  */
/* 
   Copyright (C) 1995, 1996, 1997 Free Software Foundation, Inc.
   Written by Michael I. Bushnell.

   This file is part of the GNU Hurd.

   The GNU Hurd is free software; you can redistribute it and/or
   modify it under the terms of the GNU General Public License as
   published by the Free Software Foundation; either version 2, or (at
   your option) any later version.

   The GNU Hurd is distributed in the hope that it will be useful, but
   WITHOUT ANY WARRANTY; without even the implied warranty of
   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
   General Public License for more details.

   You should have received a copy of the GNU General Public License
   along with this program; if not, write to the Free Software
   Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. */

#include "ports.h"
#include <hurd.h>

#include <stdbool.h>

void
ports_interrupt_rpcs (void *portstruct)
{
  struct port_info *pi = portstruct;
  struct rpc_info *rpc, *prev, *cancel;
  thread_t self = hurd_thread_self ();

  pthread_mutex_lock (&_ports_lock);

  cancel = NULL;
  
  /* Build a cancellation list of all RPCs that are:
     1) not ourselves AND
     2) not already marked for cancellation
     'cancel' becomes the head of the cancellation list. */
  for (rpc = pi->current_rpcs, prev = NULL; rpc; rpc = rpc->next)
    {
      if (rpc->thread != self
	  && (rpc->cl_next == NULL && rpc->cl_prev == NULL))
	{
	  if (cancel == NULL)
	    cancel = rpc;
	  rpc->cl_prev = prev;
	  if (prev != NULL)
	    prev->cl_next = rpc;
	  prev = rpc;
	}
    }

  while (cancel)
    {
      thread_t thread = cancel->thread;

      /* setting 'cancelling' guarantees that the 'cancel' data
	 structure remains accessible after the _ports_lock is
	 released and reacquired. */
      cancel->cancelling = TRUE;

      pthread_mutex_unlock (&_ports_lock);
      hurd_thread_cancel (thread);
      pthread_mutex_lock (&_ports_lock);

      _ports_record_interruption (cancel);
      cancel->cancelling = FALSE;
      cancel = cancel->cl_next;

      /* Don't remove 'cancel' from the cancellation list otherwise
	 any future calls to this function would add it onto a new
	 cancellation list resulting in pointless additional
	 cancellation. The cancellation list pointers are safely
	 managed when the RPC is completed.*/
      pthread_cond_broadcast(&_ports_block);
    }

  pthread_mutex_unlock (&_ports_lock);
}

Reply to: