[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: [Nbd] [PATCH 01/10] nbd: Fix timeout detection

On Mon, 2015-08-17 at 08:20 +0200, Markus Pargmann wrote:
> At the moment the nbd timeout just detects hanging tcp operations. This
> is not enough to detect a hanging or bad connection as expected of a
> timeout.
> This patch redesigns the timeout detection to include some more cases.
> The timeout is now in relation to replies from the server. If the server
> does not send replies within the timeout the connection will be shut
> down.
> The patch adds a continous timer 'timeout_timer' that is setup in one of
> two cases:
>  - The request list is empty and we are sending the first request out to
>    the server. We want to have a reply within the given timeout,
>    otherwise we consider the connection to be dead.
>  - A server response was received. This means the server is still
>    communicating with us. The timer is reset to the timeout value.
> The timer is not stopped if the list becomes empty. It will just trigger
> a timeout which will directly leave the handling routine again as the
> request list is empty.
> The whole patch does not use any additional explicit locking. The
> list_empty() calls are safe to be used concurrently. The timer is locked
> internally as we just use mod_timer and del_timer_sync().

This is crazy.  The timer is locked internally but the tasks are not.
So it is possible for the timeout handler to kill a task after it
exited from nbd_do_it()/nbd_thread_recv(), or after it exited entirely

> +> 	> task = READ_ONCE(nbd->task_send);
> +> 	> if (task)
> +> 	> 	> force_sig(SIGKILL, nbd->task_send);

And this is just... what?  What is the point of using READ_ONCE() if
you're going to look up nbd->task_send again?


Ben Hutchings
All extremists should be taken out and shot.

Attachment: signature.asc
Description: This is a digitally signed message part

Reply to: