Re: [Nbd] nbd hangs when disconnection the network

To: Steven Yelton <steveny@...78...>
Cc: nbd-general@lists.sourceforge.net
Subject: Re: [Nbd] nbd hangs when disconnection the network
From: Roy Keene <nbd@...51...>
Date: Thu, 19 Jan 2006 11:18:46 -0600 (CST)
Message-id: <Pine.LNX.4.64.0601191115310.25764@...53...>
In-reply-to: <43CFC6EA.80003@...78...>
References: <43CFB036.2000806@...78...> <Pine.LNX.4.64.0601190937070.25764@...53...> <43CFC6EA.80003@...78...>

Mr. Yelton,

For well controlled networks I prefer low timeouts anyway.. Idon't see any real problem with setting it low for machines that aren'tgoing to be communicating across an uncontrolled network.

Just remember that the value is used in an exponential backoff algorithm(pluss random jitter).

I looked at the kernel code and it looks like it could be done easily, butI don't have the time to engage in working on it.


On Thu, 19 Jan 2006, Steven Yelton wrote:

Roy,

Thanks for the quick and insightful reply.

echo "5" > /proc/sys/net/ipv4/tcp_retries2

brought the wait down to just a few seconds.
My only worry is what happens if the network gets busy -- will it startdropping connections? I suppose this is probably extremely unlikely with thetwo machines connected to the same Gb switch. Is there any other issues withsetting the retry value so low?
I don't know about being generally applicable, but it seems like any time nbdis involved in a raid device a tunable timeout parameter would be valuable.I would be happy to test any patch that might be written!
Steven

Roy Keene wrote:
You could try changing the value of /proc/sys/net/ipv4/tcp_retries2.
The problem is that nbd-client hands over control of the device to thekernel through an ioctl() call (ioctl(..., NBD_DO_IT)) and if theconnection dies after that, it's that kernel code's job to notice this andreturn an error after it times out.
Since it's in kernel code and not in nbd-client code, we can't just set analarm and cancel it if we get keep-alives, since we're not handling any ofthat.
So the only knob we can easily tune is the TCP retransmit timeout values.
Failing that, we can look at patching the kernel NBD code with a tuneabletimeout parameter.
On Thu, 19 Jan 2006, Steven Yelton wrote:
I have a problem with the nbd-client hanging when the network cable is

removed from the server.  Here is my setup:

storage1:
exporting raid1a
exporting raid1c

storage2:
exporting raid1b

client machine:
md0 is raid5 with nbd{0,1,2}
The raid builds and runs fine. If I kill the nbd-server on 'storage2' theraid immediately goes into a 'degraded' state (exactly as I would expect).However, if I just pull the network connection from 'storage2', md0 justhangs (even `cat /proc/mdstat` hangs). After several minutes (10, maybe)the client seems to notice the server is dead (Error: Connect: No route tohost) and the raid is degraded.
What can I do to decrease the time it takes for nbd-client to realize itcan't get to the storage machine anymore?
Thanks in advance,
Steven




-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through logfiles
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642
_______________________________________________
Nbd-general mailing list
Nbd-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nbd-general

Reply to:

References:
- [Nbd] nbd hangs when disconnection the network
  - From: Steven Yelton <steveny@...78...>
- Re: [Nbd] nbd hangs when disconnection the network
  - From: Roy Keene <nbd@...51...>
- Re: [Nbd] nbd hangs when disconnection the network
  - From: Steven Yelton <steveny@...78...>

Prev by Date: Re: [Nbd] nbd hangs when disconnection the network
Next by Date: Re: [Nbd] nbd hangs when disconnection the network
Previous by thread: Re: [Nbd] nbd hangs when disconnection the network
Next by thread: Re: [Nbd] nbd hangs when disconnection the network
Index(es):
- Date
- Thread