Re: [Nbd] nbd hangs when disconnection the network

To: nbd-general@lists.sourceforge.net
Cc: Roy Keene <nbd@...51...>
Subject: Re: [Nbd] nbd hangs when disconnection the network
From: Steven Yelton <steveny@...78...>
Date: Thu, 19 Jan 2006 12:05:46 -0500
Message-id: <43CFC6EA.80003@...78...>
In-reply-to: <Pine.LNX.4.64.0601190937070.25764@...53...>
References: <43CFB036.2000806@...78...> <Pine.LNX.4.64.0601190937070.25764@...53...>

Roy,

Thanks for the quick and insightful reply.

echo "5" > /proc/sys/net/ipv4/tcp_retries2

brought the wait down to just a few seconds.

My only worry is what happens if the network gets busy -- will it startdropping connections? I suppose this is probably extremely unlikelywith the two machines connected to the same Gb switch. Is there anyother issues with setting the retry value so low?

I don't know about being generally applicable, but it seems like anytime nbd is involved in a raid device a tunable timeout parameter wouldbe valuable. I would be happy to test any patch that might be written!


Steven

Roy Keene wrote:

You could try changing the value of /proc/sys/net/ipv4/tcp_retries2.
The problem is that nbd-client hands over control of the device to thekernel through an ioctl() call (ioctl(..., NBD_DO_IT)) and if theconnection dies after that, it's that kernel code's job to notice thisand return an error after it times out.
Since it's in kernel code and not in nbd-client code, we can't justset an alarm and cancel it if we get keep-alives, since we're nothandling any of that.
So the only knob we can easily tune is the TCP retransmit timeout values.
Failing that, we can look at patching the kernel NBD code with atuneable timeout parameter.
On Thu, 19 Jan 2006, Steven Yelton wrote:
I have a problem with the nbd-client hanging when the network cable is

removed from the server.  Here is my setup:

storage1:
exporting raid1a
exporting raid1c

storage2:
exporting raid1b

client machine:
md0 is raid5 with nbd{0,1,2}
The raid builds and runs fine. If I kill the nbd-server on'storage2' the raid immediately goes into a 'degraded' state (exactlyas I would expect). However, if I just pull the network connectionfrom 'storage2', md0 just hangs (even `cat /proc/mdstat` hangs).After several minutes (10, maybe) the client seems to notice theserver is dead (Error: Connect: No route to host) and the raid isdegraded.
What can I do to decrease the time it takes for nbd-client to realizeit can't get to the storage machine anymore?
Thanks in advance,
Steven




-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep throughlog files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642
_______________________________________________
Nbd-general mailing list
Nbd-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nbd-general

Reply to:

Follow-Ups:
- Re: [Nbd] nbd hangs when disconnection the network
  - From: Roy Keene <nbd@...51...>

References:
- [Nbd] nbd hangs when disconnection the network
  - From: Steven Yelton <steveny@...78...>
- Re: [Nbd] nbd hangs when disconnection the network
  - From: Roy Keene <nbd@...51...>

Prev by Date: Re: [Nbd] nbd hangs when disconnection the network
Next by Date: Re: [Nbd] nbd hangs when disconnection the network
Previous by thread: Re: [Nbd] nbd hangs when disconnection the network
Next by thread: Re: [Nbd] nbd hangs when disconnection the network
Index(es):
- Date
- Thread