[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: [PATCH 2/2] nbd: requeue command if the soecket is changed



On Wed, Mar 4, 2020 at 6:21 AM Mike Christie <mchristi@redhat.com> wrote:
>
> On 03/03/2020 04:06 PM, Mike Christie wrote:
> > On 03/03/2020 03:13 PM, Josef Bacik wrote:
> >> On 2/28/20 1:40 AM, Hou Pu wrote:
> >>> In commit 2da22da5734 (nbd: fix zero cmd timeout handling v2),
> >>> it is allowed to reset timer when it fires if tag_set.timeout
> >>> is set to zero. If the server is shutdown and a new socket
> >>> is reconfigured, the request should be requeued to be processed by
> >>> new server instead of waiting for response from the old one.
> >>>
> >>> Signed-off-by: Hou Pu <houpu@bytedance.com>
> >>
> >> I'm confused by this, if we get here we've already timed out and
> >> requeued once right?  Why do we need to requeue again?  Thanks,
> >>

Please see Mike's comments below. Thanks.

> >
> > We may not have timed out already. If the tag_set.timeout=0, then the
> > block timer will fire every 30 seconds. This could be the first time the
> > timer has fired. If it has fired multiple times already then it still
> > would not have been requeued because the num_connections=1 code just
> > does a BLK_EH_RESET_TIMER when timeout=0 and does not have support for
> > detecting reconnects.
> >
> > In this second patch if timeout=0 and num_connections=1 we restart the
> > command when the command timer fires and we detect a new connection
> > (nsock->cookie has incremented).
> >
> > I was saying in the last patch, maybe waiting for reconnect is wrong.
> > Does a cmd timeout=0 mean to wait for a reconnect or in this patch
> > should we do:
> >
> > 1. if timeout=0, num_connections=1, and the cmd timer fires and the
> > conneciton is marked dead then requeue the command.
> > 2. we then rely on the dead_conn_timeout code to decide how long to wait
> > for a reconnect.
> >

With the above 2 steps, we have same timeout action in following three case:
A. connections > 1
B. connections ==1 && tag_set.timeout > 0
C. connections ==1 && tag_set.timeout = 0
In all above case, socket is marked dead if needed. Request is requeued.
This also means that if timeout is set to 0 by user space, it will not
have any effect.

This looks good only except that the behavior of case C is not same as before.
(Before we wait until the request is completed. Now wait at most
dead_conn_timeout)
I am not sure if any user space tools relay on it.

I'd like to say that I prefer this way than reseting timer until the
request is completed.
But for now it might be better to keep the behavior same with before.
So I'd like to suggest that we fix reconnection in case B and C with
current patches if you agree.

Thanks.

>
> Oh yeah, I had thought Hou implemented timeout=0 to wait for a reconnect
> to handle existing apps. However, I am not sure if they exist. When we
> had timeout=0 support the first time then we did not have multi conn and
> reconnect support yet.

I was trying to keep the behavior of timeout=0 same as before, with
the difference
of only requeue the cmd if needed. I currently do not configure timeout to 0
in our environment.

>
> The current timeout=0 and reconnect support does not work since that is
> what Hou is implementing, so we can decide the behavior now.
>

Please see above.

Thanks and Regards
Hou.


Reply to: