Re: [Nbd] write_zeroes/trim on the whole disk

To: Vladimir Sementsov-Ogievskiy <vsementsov@...2319...>, qemu-devel <qemu-devel@...530...>, qemu-block@...530..., nbd-general@lists.sourceforge.net
Cc: kwolf@...696..., "Denis V. Lunev" <den@...2317...>, Wouter Verhelst <w@...112...>, Stefan Hajnoczi <stefanha@...696...>, Paolo Bonzini <pbonzini@...696...>
Subject: Re: [Nbd] write_zeroes/trim on the whole disk
From: Eric Blake <eblake@...696...>
Date: Fri, 23 Sep 2016 14:00:06 -0500
Message-id: <a3d525e9-a66e-d086-55a4-5def3824964d@...696...>
In-reply-to: <57E5752C.3080407@...2319...>
References: <57E5752C.3080407@...2319...>

On 09/23/2016 01:32 PM, Vladimir Sementsov-Ogievskiy wrote:
> Hi all!
> 
> There is a following problem. When we need to write_zeroes or trim the
> whole disk, we have to do it iteratively, because of 32-bit restriction
> on request length.
> For example, current implementation of mirror (see mirror_dirty_init())
> do this by chunks of 2147418112 bytes (with default granularity of
> 65536). So, to zero 16tb disk we will make 8192 requests instead of one.
> 
> Incremental zeroing of 1tb qcow2 takes > 80 seconds for me (see below).
> This means ~20 minutes for copying empty 16tb qcow2 disk which is
> obviously a waste of time.
> 
> We see the following solutions for nbd:
> ||
> 1. Add command NBD_MAKE_EMPTY, with flag, saying what should be done:
> trim or write_zeroes.

Presumably spelled NBD_CMD_MAKE_EMPTY.

> 2. Add flag NBD_CMD_FLAG_WHOLE for commands NBD_TRIM and
> NBD_WRITE_ZEROES, which will say (with zeroed offset and lenght of the
> request), that the whole disk should be discarded/zeroed.

Both of these are possible.  As it is, NBD_CMD_WRITE_ZEROES is not even
formally part of the NBD spec yet, although NBD_CMD_TRIM is (I'm still
sitting on my qemu proof-of-concept patches for WRITE_ZEROES, and need
to resubmit them now that the qemu 2.8 development window is open).
Either way, the server would have to advertise if the new command and/or
new flags to existing commands are available for a whole-disk trim/zero,
before a client could use it, and clients must be prepared to fall back
to incremental approaches otherwise.

My preference would be a new flag to the existing commands, with
explicit documentation that 0 offset and 0 length must be used with that
flag, when requesting a full-device wipe.

> 3. Increase length field of the request to 64bit.

No; that won't work.  It would be a fundamental change to the NBD
protocol, and require both new servers and new clients to talk a
different wire protocol with different size length parameters.

> 
> As soon as we have some way to empty disk  in nbd, we can use
> qcow2_make_empty, to trim the whole disk (and something similar should
> be done for zeroing).
> 
> What do you think about this all, and which way has a chance to get into
> nbd proto?

It's not necessarily obvious that the ability to bulk-trim or bulk-zero
a device should be fundamentally faster than doing it incrementally in
2G chunks; but I concede that there may indeed be scenarios such as
qemu's qcow2 file where that is true.  So it does sound like a useful
option and/or command to be proposed for addition to the NBD protocol,
from that point of view.

As with other extensions to NBD, the best way is to write up a proposal
for how the documentation should change, submit that as patches to the
nbd list, and accompany it with a proof-of-concept implementation
(qemu's nbd server and nbd client work well), so that we can iron out
the details of the documentation before making it a formal part of the
spec.  It's important to remember that such a proposal should still be
optional (a server need not implement the new mode, and a client should
be prepared to fall back to other means if the server does not support a
whole-device action).

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org

Attachment: signature.asc
Description: OpenPGP digital signature

Reply to:

Follow-Ups:
- Re: [Nbd] write_zeroes/trim on the whole disk
  - From: Wouter Verhelst <w@...112...>

References:
- [Nbd] write_zeroes/trim on the whole disk
  - From: Vladimir Sementsov-Ogievskiy <vsementsov@...2319...>

Prev by Date: [Nbd] write_zeroes/trim on the whole disk
Next by Date: Re: [Nbd] write_zeroes/trim on the whole disk
Previous by thread: [Nbd] write_zeroes/trim on the whole disk
Next by thread: Re: [Nbd] write_zeroes/trim on the whole disk
Index(es):
- Date
- Thread