You could try changing the value of /proc/sys/net/ipv4/tcp_retries2.
The problem is that nbd-client hands over control of the device to the
kernel through an ioctl() call (ioctl(..., NBD_DO_IT)) and if the
connection dies after that, it's that kernel code's job to notice this and
return an error after it times out.
Since it's in kernel code and not in nbd-client code, we can't just set an
alarm and cancel it if we get keep-alives, since we're not handling any of
that.
So the only knob we can easily tune is the TCP retransmit timeout values.
Failing that, we can look at patching the kernel NBD code with a tuneable
timeout parameter.
On Thu, 19 Jan 2006, Steven Yelton wrote:
I have a problem with the nbd-client hanging when the network cable is
removed from the server. Here is my setup:
storage1:
exporting raid1a
exporting raid1c
storage2:
exporting raid1b
client machine:
md0 is raid5 with nbd{0,1,2}
The raid builds and runs fine. If I kill the nbd-server on 'storage2' the
raid immediately goes into a 'degraded' state (exactly as I would expect).
However, if I just pull the network connection from 'storage2', md0 just
hangs (even `cat /proc/mdstat` hangs). After several minutes (10, maybe)
the client seems to notice the server is dead (Error: Connect: No route to
host) and the raid is degraded.
What can I do to decrease the time it takes for nbd-client to realize it
can't get to the storage machine anymore?
Thanks in advance,
Steven
-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log
files
for problems? Stop! Download the new AJAX search engine that makes
searching your log files as easy as surfing the web. DOWNLOAD SPLUNK!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642
_______________________________________________
Nbd-general mailing list
Nbd-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nbd-general