open-isns porting question: sudden SIGLOST
Dear Hurd porters,
(please cc me in replies, I'm not subscribed)
I've recently uploaded open-isns to the archive and noticed that it
fails to build on non-Linux ports, even though it's just a normal
server application (no special kernel features required), so I thought
I'd fix that. I've locally ported the package to kFreeBSD, and then
proceeded with Hurd. In the latter case, it now builds in a VM on my
local machine, but the basic functionality does not work at all.
Specifically, the server daemon doesn't seem to process requests but
catches SIGLOST - as does the client. According to what I've read, that
indicates that an RPC server died unexpectedly. And I don't think that
should happen in this case.
I've run an rpctrace on the server process, and got the following
immediately before accepting the client connection (poll() is called):
80<--82(pid2269)->io_select_timeout ({1469369509 261950000} 1) ...83
task49(pid2269)->mach_port_allocate (3) = 0 pn{ 28}
task49(pid2269)->mach_port_move_member (pn{ 26} pn{ 28}) = 0
86<--89(pid2269)->io_select_timeout ({1469369509 261950000} 1) ...87
task49(pid2269)->mach_port_move_member (pn{ 29} pn{ 28}) = 0
And the following immediately after the client tries to send a message
over the UNIX socket:
task49(pid2269)->mach_port_destroy (pn{ 26}) = 0
task49(pid2269)->mach_port_destroy (pn{ 29}) = 0
task49(pid2269)->mach_port_destroy (pn{ 28}) = 0
80<--82(pid2269)->socket_accept () = 0 87<--85(pid2269) 83<--92(pid2269)
task49(pid2269)->mach_port_deallocate (pn{ 29}) = 0
80<--82(pid2269)->io_select_timeout ({1469366214 511950000} 1) ...88
task49(pid2269)->mach_port_allocate (3) = 0 pn{ 26}
task49(pid2269)->mach_port_move_member (pn{ 29} pn{ 26}) = 0
86<--89(pid2269)->io_select_timeout ({1469366214 511950000} 1) ...93
task49(pid2269)->mach_port_move_member (pn{ 30} pn{ 26}) = 0
87<--85(pid2269)->io_select_timeout ({1469366214 511950000} 1) ...83
task49(pid2269)->mach_port_move_member (pn{ 31} pn{ 26}) = 0
task49(pid2269)->mach_port_destroy (pn{ 29}) = 0
task49(pid2269)->mach_port_destroy (pn{ 30}) = 0
task49(pid2269)->mach_port_destroy (pn{ 31}) = 0
task49(pid2269)->mach_port_destroy (pn{ 26}) = 0
87<--85(pid2269)->socket_recv (128 8192) = 0 (null) "" 83<--94(pid2269) "`" 0
task49(pid2269)->vm_allocate (0 4 1) = 0 22237184
task49(pid2269)->vm_allocate (0 96 1) = 0 22241280
task49(pid2269)->mach_port_mod_refs (pn{ 4} 1 -1) = 0
task49(pid2269)->mach_port_deallocate (pn{ 0}) = 0xf ((os/kern) invalid name)
task49(pid2269)->mach_port_deallocate (pn{ 1}) = 0
58<--64(pid2269)->proc_dostop_request ( thread68(pid2269)) = 0
58<--64(pid2269)->proc_mark_exit_request (32 0) = 0
task49(pid2269)->task_terminate () = 0
Child 2269 Resource lost
The client has the following last lines of the trace (starting from the
socket connect):
task49(pid2272)->vm_allocate (0 4096 1) = 0 22265856
26<--62(pid2272)->dir_lookup ("servers/socket/1" 0 0) = 0 1 "" 54<--82(pid2272)
54<--82(pid2272)->socket_create (1 0) = 0 81<--77(pid2272)
81<--77(pid2272)->io_set_all_openmodes (8) = 0
26<--62(pid2272)->dir_lookup ("run/isnsctl" 0 0) = 0 1 "isnsctl" 84<--83(pid2272)
84<--83(pid2272)->dir_lookup ("isnsctl" 0 0) = 0 1 "" 86<--85(pid2272)
task49(pid2272)->mach_port_deallocate (pn{ 24}) = 0
86<--85(pid2272)->ifsock_getsockaddr () = 0 84<--87(pid2272)
task49(pid2272)->mach_port_deallocate (pn{ 25}) = 0
81<--77(pid2272)->socket_connect ( 84<--87(pid2272)) = 0
task49(pid2272)->mach_port_deallocate (pn{ 24}) = 0
81<--77(pid2272)->io_select_timeout ({1469365974 511950000} 3) ...80
task49(pid2272)->mach_port_destroy (pn{ 24}) = 0
task49(pid2272)->mach_port_insert_right (pn{ 24} 80) = 0
6<--63(pid2272)->auth_user_authenticate ( 80<--87(pid-1)) ...69
task49(pid2272)->mach_port_deallocate (pn{ 25}) = 0x11 ((os/kern) invalid right)
81<--77(pid2272)->socket_send ( (null) 128 "" 80<--87(pid-1) "`") = 0 52
task49(pid2272)->mach_port_deallocate (pn{ 0}) = 0xf ((os/kern) invalid name)
task49(pid2272)->mach_port_deallocate (pn{ 24}) = 0
81<--77(pid2272)->io_select_timeout ({1469365974 511950000} 1) ...86
task49(pid2272)->mach_port_destroy (pn{ 26}) = 0
81<--77(pid2272)->socket_recv (128 8192) = 0 (null) "" "" 0
task49(pid2272)->mach_port_mod_refs (pn{ 4} 1 -1) = 0
task49(pid2272)->mach_port_deallocate (pn{ 0}) = 0xf ((os/kern) invalid name)
task49(pid2272)->mach_port_deallocate (pn{ 1}) = 0
58<--64(pid2272)->proc_dostop_request ( thread66(pid2272)) = 0
58<--64(pid2272)->proc_mark_exit_request (32 0) = 0
task49(pid2272)->task_terminate () = 0
69... = 0x40000016 (Invalid argument)
Child 2272 Resource lost
Peculiarities: the code uses SCM_CREDS and sendmsg/recvmsg over the
local UNIX socket to check authorization. (Since there's no man page I
could find about it, I assume that SCM_CREDS is implemented in the same
way that kFreeBSD does it, at least if I look at the system header
files.)
Any idea on how to debug this? I could of course just ignore SIGLOST in
the code, but that doesn't appear to be the default on Hurd, probably
for good reason? (Also, I haven't tried ignoring it, so maybe the
underlying issue will cause it to not work anyway.)
If you want, I can obviously upload a package to the archive
that will build (but not work) on Hurd with some instructions on how to
reproduce the issue, but I though I'd ask here first, maybe the
solution is really simple.
Thank you!
Regards,
Christian
Reply to: