[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

open-isns porting question: sudden SIGLOST



Dear Hurd porters,
(please cc me in replies, I'm not subscribed)

I've recently uploaded open-isns to the archive and noticed that it
fails to build on non-Linux ports, even though it's just a normal
server application (no special kernel features required), so I thought
I'd fix that. I've locally ported the package to kFreeBSD, and then
proceeded with Hurd. In the latter case, it now builds in a VM on my
local machine, but the basic functionality does not work at all.

Specifically, the server daemon doesn't seem to process requests but
catches SIGLOST - as does the client. According to what I've read, that
indicates that an RPC server died unexpectedly. And I don't think that
should happen in this case.

I've run an rpctrace on the server process, and got the following
immediately before accepting the client connection (poll() is called):

  80<--82(pid2269)->io_select_timeout ({1469369509 261950000} 1) ...83
task49(pid2269)->mach_port_allocate (3) = 0 pn{ 28}
task49(pid2269)->mach_port_move_member (pn{ 26} pn{ 28}) = 0 
  86<--89(pid2269)->io_select_timeout ({1469369509 261950000} 1) ...87
task49(pid2269)->mach_port_move_member (pn{ 29} pn{ 28}) = 0 

And the following immediately after the client tries to send a message
over the UNIX socket:

task49(pid2269)->mach_port_destroy (pn{ 26}) = 0 
task49(pid2269)->mach_port_destroy (pn{ 29}) = 0 
task49(pid2269)->mach_port_destroy (pn{ 28}) = 0 
  80<--82(pid2269)->socket_accept () = 0    87<--85(pid2269)    83<--92(pid2269)
task49(pid2269)->mach_port_deallocate (pn{ 29}) = 0 
  80<--82(pid2269)->io_select_timeout ({1469366214 511950000} 1) ...88
task49(pid2269)->mach_port_allocate (3) = 0 pn{ 26}
task49(pid2269)->mach_port_move_member (pn{ 29} pn{ 26}) = 0 
  86<--89(pid2269)->io_select_timeout ({1469366214 511950000} 1) ...93
task49(pid2269)->mach_port_move_member (pn{ 30} pn{ 26}) = 0 
  87<--85(pid2269)->io_select_timeout ({1469366214 511950000} 1) ...83
task49(pid2269)->mach_port_move_member (pn{ 31} pn{ 26}) = 0 
task49(pid2269)->mach_port_destroy (pn{ 29}) = 0 
task49(pid2269)->mach_port_destroy (pn{ 30}) = 0 
task49(pid2269)->mach_port_destroy (pn{ 31}) = 0 
task49(pid2269)->mach_port_destroy (pn{ 26}) = 0 
  87<--85(pid2269)->socket_recv (128 8192) = 0  (null) ""    83<--94(pid2269) "`" 0
task49(pid2269)->vm_allocate (0 4 1) = 0 22237184
task49(pid2269)->vm_allocate (0 96 1) = 0 22241280
task49(pid2269)->mach_port_mod_refs (pn{  4} 1 -1) = 0 
task49(pid2269)->mach_port_deallocate (pn{  0}) = 0xf ((os/kern) invalid name) 
task49(pid2269)->mach_port_deallocate (pn{  1}) = 0 
  58<--64(pid2269)->proc_dostop_request ( thread68(pid2269)) = 0 
  58<--64(pid2269)->proc_mark_exit_request (32 0) = 0 
task49(pid2269)->task_terminate () = 0 
Child 2269 Resource lost

The client has the following last lines of the trace (starting from the
socket connect):

task49(pid2272)->vm_allocate (0 4096 1) = 0 22265856
  26<--62(pid2272)->dir_lookup ("servers/socket/1" 0 0) = 0 1 ""    54<--82(pid2272)
  54<--82(pid2272)->socket_create (1 0) = 0    81<--77(pid2272)
  81<--77(pid2272)->io_set_all_openmodes (8) = 0 
  26<--62(pid2272)->dir_lookup ("run/isnsctl" 0 0) = 0 1 "isnsctl"    84<--83(pid2272)
  84<--83(pid2272)->dir_lookup ("isnsctl" 0 0) = 0 1 ""    86<--85(pid2272)
task49(pid2272)->mach_port_deallocate (pn{ 24}) = 0 
  86<--85(pid2272)->ifsock_getsockaddr () = 0    84<--87(pid2272)
task49(pid2272)->mach_port_deallocate (pn{ 25}) = 0 
  81<--77(pid2272)->socket_connect (   84<--87(pid2272)) = 0 
task49(pid2272)->mach_port_deallocate (pn{ 24}) = 0 
  81<--77(pid2272)->io_select_timeout ({1469365974 511950000} 3) ...80
task49(pid2272)->mach_port_destroy (pn{ 24}) = 0 
task49(pid2272)->mach_port_insert_right (pn{ 24}   80) = 0 
  6<--63(pid2272)->auth_user_authenticate (   80<--87(pid-1)) ...69
task49(pid2272)->mach_port_deallocate (pn{ 25}) = 0x11 ((os/kern) invalid right) 
  81<--77(pid2272)->socket_send ( (null) 128 ""    80<--87(pid-1) "`") = 0 52
task49(pid2272)->mach_port_deallocate (pn{  0}) = 0xf ((os/kern) invalid name) 
task49(pid2272)->mach_port_deallocate (pn{ 24}) = 0 
  81<--77(pid2272)->io_select_timeout ({1469365974 511950000} 1) ...86
task49(pid2272)->mach_port_destroy (pn{ 26}) = 0 
  81<--77(pid2272)->socket_recv (128 8192) = 0  (null) ""  "" 0
task49(pid2272)->mach_port_mod_refs (pn{  4} 1 -1) = 0 
task49(pid2272)->mach_port_deallocate (pn{  0}) = 0xf ((os/kern) invalid name) 
task49(pid2272)->mach_port_deallocate (pn{  1}) = 0 
  58<--64(pid2272)->proc_dostop_request ( thread66(pid2272)) = 0 
  58<--64(pid2272)->proc_mark_exit_request (32 0) = 0 
task49(pid2272)->task_terminate () = 0 
69... = 0x40000016 (Invalid argument) 
Child 2272 Resource lost

Peculiarities: the code uses SCM_CREDS and sendmsg/recvmsg over the
local UNIX socket to check authorization. (Since there's no man page I
could find about it, I assume that SCM_CREDS is implemented in the same
way that kFreeBSD does it, at least if I look at the system header
files.)

Any idea on how to debug this? I could of course just ignore SIGLOST in
the code, but that doesn't appear to be the default on Hurd, probably
for good reason? (Also, I haven't tried ignoring it, so maybe the
underlying issue will cause it to not work anyway.)

If you want, I can obviously upload a package to the archive
that will build (but not work) on Hurd with some instructions on how to
reproduce the issue, but I though I'd ask here first, maybe the
solution is really simple.

Thank you!

Regards,
Christian


Reply to: