Re: [Nbd] [PATCH 01/10] nbd: Fix timeout detection

To: Markus Pargmann <mpa@...1897...>, Jens Axboe <axboe@...161...>
Cc: nbd-general@lists.sourceforge.net, linux-kernel@...25..., kernel@...1897..., Hermann, Lauer <Hermann.Lauer@...1489...>
Subject: Re: [Nbd] [PATCH 01/10] nbd: Fix timeout detection
From: Ben Hutchings <ben@...1505...>
Date: Mon, 28 Sep 2015 01:27:44 +0100
Message-id: <1443400064.2517.16.camel@...1505...>
In-reply-to: <1439792409-28543-2-git-send-email-mpa@...1897...>
References: <1439792409-28543-1-git-send-email-mpa@...1897...> <1439792409-28543-2-git-send-email-mpa@...1897...>

On Mon, 2015-08-17 at 08:20 +0200, Markus Pargmann wrote:
> At the moment the nbd timeout just detects hanging tcp operations. This
> is not enough to detect a hanging or bad connection as expected of a
> timeout.
> 
> This patch redesigns the timeout detection to include some more cases.
> The timeout is now in relation to replies from the server. If the server
> does not send replies within the timeout the connection will be shut
> down.
> 
> The patch adds a continous timer 'timeout_timer' that is setup in one of
> two cases:
>  - The request list is empty and we are sending the first request out to
>    the server. We want to have a reply within the given timeout,
>    otherwise we consider the connection to be dead.
>  - A server response was received. This means the server is still
>    communicating with us. The timer is reset to the timeout value.
> 
> The timer is not stopped if the list becomes empty. It will just trigger
> a timeout which will directly leave the handling routine again as the
> request list is empty.
> 
> The whole patch does not use any additional explicit locking. The
> list_empty() calls are safe to be used concurrently. The timer is locked
> internally as we just use mod_timer and del_timer_sync().

This is crazy.  The timer is locked internally but the tasks are not.
So it is possible for the timeout handler to kill a task after it
exited from nbd_do_it()/nbd_thread_recv(), or after it exited entirely
(use-after-free).

[...]
> +> 	> task = READ_ONCE(nbd->task_send);
> +> 	> if (task)
> +> 	> 	> force_sig(SIGKILL, nbd->task_send);
[...]

And this is just... what?  What is the point of using READ_ONCE() if
you're going to look up nbd->task_send again?

Ben.

-- 
Ben Hutchings
All extremists should be taken out and shot.

Attachment: signature.asc
Description: This is a digitally signed message part

Reply to:

Prev by Date: Re: [Nbd] [RFC] Proposal: NBD_CMD_READ2
Previous by thread: Re: [Nbd] NBD: Disconnect connection/kill NBD server cause kernel bug even kernel hang
Index(es):
- Date
- Thread