Re: [PATCH 1/1] protocol: Add NBD_CMD_FLAG_FAST_ZERO

To: Wouter Verhelst <wouter@grep.be>
Cc: Wouter Verhelst <w@uter.be>, nbd@other.debian.org
Subject: Re: [PATCH 1/1] protocol: Add NBD_CMD_FLAG_FAST_ZERO
From: Eric Blake <eblake@redhat.com>
Date: Mon, 26 Aug 2019 10:30:38 -0500
Message-id: <[🔎] 8733bcee-6a48-a0b0-c919-82046a30c026@redhat.com>
In-reply-to: <[🔎] 20190824064448.q62iwelqjn2safao@grep.be>
References: <[🔎] 25ead363-4f37-5450-b985-1876374e314d@redhat.com> <[🔎] 20190823143426.26838-1-eblake@redhat.com> <[🔎] 20190823143426.26838-2-eblake@redhat.com> <[🔎] 20190823184834.brhsfbc4sdq5xuij@grep.be> <[🔎] d3d1590e-e276-e449-c3da-0bdc4d4977d7@redhat.com> <[🔎] 20190824064448.q62iwelqjn2safao@grep.be>

[dropping other lists for now]

On 8/24/19 1:44 AM, Wouter Verhelst wrote:

>>> One way of fulfilling the letter of this requirement but not its spirit
>>> could be to have background writes; that is, the server makes a note
>>> that the zeroed region should contain zeroes, makes an error-free reply
>>> to the client, and then starts updating things in the background (with
>>> proper layering so that an NBD_CMD_READ would see zeroes).
>>
>> For writes, this should still be viable IF the server can also cancel
>> that background write of zeroes in favor of a foreground request for
>> actual data to be written to the same offset.  In other words, as long
>> as the behavior to the client is "as if" there is no duplicated I/O
>> cost, the zero appears fast, even if it kicked off a long-running async
>> process to actually accomplish it.
> 
> That's kind of what I was thinking of, yeah.
> 
> A background write would cause disk I/O, which *will* cause any write
> that happen concurrently with it to slow down. If we need to write
> several orders of magnitude of zeroes, then the "fast zero" will
> actually cause the following writes to slow down, which could impact
> performance.
> 
> The cancelling should indeed happen (otherwise ordering of writes will
> be wrong, as per the spec), but that doesn't negate the performance
> impact.
> 
>>> This could negatively impact performance after that command to the
>>> effect that syncing the device would be slower rather than faster, if
>>> not done right.
>>
>> Oh. I see - for flush requests, you're worried about the cost of the
>> flush forcing the I/O for the background zero to complete before flush
>> can return.
>>
>> Perhaps that merely means that a client using fast zero requests as a
>> means of probing whether it can do a bulk pre-zero pass even though it
>> will be rewriting part of that image with data later SHOULD NOT attempt
>> to flush the disk until all other interesting write requests are also
>> ready to queue.  In the 'qemu-img convert' case which spawned this
>> discussion, that's certainly the case (qemu-img does not call flush
>> after the pre-zeroing, but only after all data is copied - and then it
>> really DOES want to wait for any remaining backgrounded zeroing to land
>> on the disk along with any normal writes when it does its final flush).
> 
> Not what I meant, but also a good point, thanks :)
> 
>>> Do we want to keep that in consideration?
>>
>> Ideas on how best to add what I mentioned above into the specification?
> 
> Perhaps clarify that the "fast zero" flag is meant to *improve*
> performance, and that it therefore should either be implemented in a way
> that does in fact improve performance, or not at all?
> 

Here's the wording changes I'm considering (ragged lines to minimize
churn; I can reflow the existing paragraph if we like it):

diff --git i/doc/proto.md w/doc/proto.md
index 914910f..b98a455 100644
--- i/doc/proto.md
+++ w/doc/proto.md
@@ -2054,7 +2054,7 @@ The following request types exist:
     `NBD_ENOTSUP` unless the request can be serviced in less time than
     a corresponding `NBD_CMD_WRITE`, and SHOULD NOT alter the contents
     of the export when returning this failure. The server's
-    determination of a fast request MAY depend on a number of factors,
+    determination on whether to fail a fast request MAY depend on a
number of factors,
     such as whether the request was suitably aligned, on whether the
     `NBD_CMD_FLAG_NO_HOLE` flag was present, or even on whether a
     previous `NBD_CMD_TRIM` had been performed on the region.  If the
@@ -2062,12 +2062,30 @@ The following request types exist:
     NOT fail with `NBD_ENOTSUP`, regardless of the speed of servicing
     a request, and SHOULD fail with `NBD_EINVAL` if the
     `NBD_CMD_FLAG_FAST_ZERO` flag was set. A server MAY advertise
-    `NBD_FLAG_SEND_FAST_ZERO` whether or not it can perform fast
-    zeroing; similarly, a server SHOULD fail with `NBD_ENOTSUP` when
-    the flag is set if the server cannot quickly determine in advance
-    whether that request would have been fast, even if it turns out
+    `NBD_FLAG_SEND_FAST_ZERO` whether or not it will actually succeed
+    on a fast zero request (a fast failure of `NBD_ENOTSUP` still
+    counts as a fast response); similarly, a server SHOULD fail a fast
+    zero request with `NBD_ENOTSUP` if the server cannot quickly
determine in advance
+    whether the request would be fast, even if it turns out
     that the same request without the flag would be fast after all.

+    One intended use of a fast zero request is optimizing the copying
+    of a sparse image source into the export: a client can request
+    fast zeroing of the entire export, and if it succeeds, follow that
+    with write requests to just the data portions before a single
+    flush of the entire image, for fewer transactions overall.  On the
+    other hand, if the fast zero request fails, the fast failure lets
+    the client know that it must manually write zeroes corresponding
+    to the holes of the source image before a final flush, for more
+    transactions but with no time lost to duplicated I/O to the data
+    portions.  Knowing this usage pattern can help decide whether a
+    server's implementation for writing zeroes counts as fast (for
+    example, a successful fast zero request may start a background
+    operation that would cause the next flush request to take longer,
+    but that is okay as long as intermediate writes before that flush
+    do not further lengthen the time spent on the overall sequence of
+    operations).
+
     If an error occurs, the server MUST set the appropriate error code
     in the error field.



-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

Attachment: signature.asc
Description: OpenPGP digital signature

Reply to:

References:
- cross-project patches: Add NBD Fast Zero support
  - From: Eric Blake <eblake@redhat.com>
- [PATCH 0/1] NBD protocol change to add fast zero support
  - From: Eric Blake <eblake@redhat.com>
- [PATCH 1/1] protocol: Add NBD_CMD_FLAG_FAST_ZERO
  - From: Eric Blake <eblake@redhat.com>
- Re: [PATCH 1/1] protocol: Add NBD_CMD_FLAG_FAST_ZERO
  - From: Wouter Verhelst <w@uter.be>
- Re: [PATCH 1/1] protocol: Add NBD_CMD_FLAG_FAST_ZERO
  - From: Eric Blake <eblake@redhat.com>
- Re: [PATCH 1/1] protocol: Add NBD_CMD_FLAG_FAST_ZERO
  - From: Wouter Verhelst <wouter@grep.be>

Prev by Date: Re: [PATCH 1/1] protocol: Add NBD_CMD_FLAG_FAST_ZERO
Next by Date: Re: [Libguestfs] cross-project patches: Add NBD Fast Zero support
Previous by thread: Re: [PATCH 1/1] protocol: Add NBD_CMD_FLAG_FAST_ZERO
Next by thread: Re: [PATCH 1/1] protocol: Add NBD_CMD_FLAG_FAST_ZERO
Index(es):
- Date
- Thread