Re: [Nbd] [PATCH] nbd: improve request timeouts handling
On Mon, Jul 29, 2013 at 12:45:33PM -0400, Paul Clements wrote:
> Michal,
>
> thanks for this...a couple of preliminary questions...
>
> On Mon, Jun 17, 2013 at 4:10 PM, Michal Belczyk <belczyk@...1274...>wrote:
>
> > The main idea behind it is to be able to quickly detect broken replica
> > and switch over to another when used with any sort of mirror type device
> > built on top of any number of nbd devices.
> >
>
> I'm wondering if a simpler change would have the same result. Namely, could
> you just apply the xmit_timeout, as currently implemented, to the recvmsg
> path (ignoring the timeout, of course, when no requests are queued). Is
> there a case that this would not handle that the proposed patch handles?
> They seem to be equivalent.
Oh... no, they would not be equivalent.
It is all my fault -- the commit message lacks very important information:
Timeouts implemented via timers directly on top of kernel_sendmsg() or
kernel_recvmsg() routines apply to a single bio_vec being sent/received at
a time while the proposed patch makes a block queue request the subject to
time out -- timeouts handling was moved from the sock_xmit() way up to
nbd_send_req() and nbd_do_it()...
Currently, assuming for this example max_sectors_kb=512, a single
direct read request from an nbd device:
# dd if=/dev/nbd0 of=/dev/null bs=512K iflag=direct count=1
generates:
[1301353.424080] nbd0: request ffff8802253d22f0: got reply
[1301353.424089] nbd0: request ffff8802253d22f0: got 4096 bytes data
[1301353.424092] nbd0: request ffff8802253d22f0: got 4096 bytes data
[ ... 128 4KB bio_vec-s / 128 sock_xmit() calls? ]
So, assuming a timeout set to 30s, a single 512KB read request could
take up to 128 times 30s which makes... 64 minutes, if the timeouts were
implemented the way you ask...
Well, they actually _are_ implemented this way in the upstream code, so
a single direct write request like this will generate 128 timer
arms/disarms AND a 30s timeout actually means that such request can take
up to 64 minutes...
I think it is wrong and I believe that if the maximum sized (subject to
now configurable max_sectors_kb tunable) request cannot be fulfilled
within a specified timeout then this NBD device queue should die,
hopefully being handled by a DM layer on top... and such request could
be happily requeued to a well-behaving replica, possibly another NBD
device...
> Also move nbd->pid modifications behind nbd->tx_lock wherever possible
> > to avoid races between the concurrent nbd-client invocations.
> >
>
> Is there a bug related to this that you've seen, or is this just a
> precaution?
Precaution but 1) I don't like that the current code checks for nbd->pid
with nbd->tx_lock held and modifies it later with the same lock released
in nbd_do_it() and 2) the proposed patch uses nbd->pid to determine if
the device is in use (nbd-client alive) -- should it do it some other
way?
Thank you for the feedback!
--
Michal Belczyk Sr.
Reply to: