[dropping other lists for now] On 8/24/19 1:44 AM, Wouter Verhelst wrote: >>> One way of fulfilling the letter of this requirement but not its spirit >>> could be to have background writes; that is, the server makes a note >>> that the zeroed region should contain zeroes, makes an error-free reply >>> to the client, and then starts updating things in the background (with >>> proper layering so that an NBD_CMD_READ would see zeroes). >> >> For writes, this should still be viable IF the server can also cancel >> that background write of zeroes in favor of a foreground request for >> actual data to be written to the same offset. In other words, as long >> as the behavior to the client is "as if" there is no duplicated I/O >> cost, the zero appears fast, even if it kicked off a long-running async >> process to actually accomplish it. > > That's kind of what I was thinking of, yeah. > > A background write would cause disk I/O, which *will* cause any write > that happen concurrently with it to slow down. If we need to write > several orders of magnitude of zeroes, then the "fast zero" will > actually cause the following writes to slow down, which could impact > performance. > > The cancelling should indeed happen (otherwise ordering of writes will > be wrong, as per the spec), but that doesn't negate the performance > impact. > >>> This could negatively impact performance after that command to the >>> effect that syncing the device would be slower rather than faster, if >>> not done right. >> >> Oh. I see - for flush requests, you're worried about the cost of the >> flush forcing the I/O for the background zero to complete before flush >> can return. >> >> Perhaps that merely means that a client using fast zero requests as a >> means of probing whether it can do a bulk pre-zero pass even though it >> will be rewriting part of that image with data later SHOULD NOT attempt >> to flush the disk until all other interesting write requests are also >> ready to queue. In the 'qemu-img convert' case which spawned this >> discussion, that's certainly the case (qemu-img does not call flush >> after the pre-zeroing, but only after all data is copied - and then it >> really DOES want to wait for any remaining backgrounded zeroing to land >> on the disk along with any normal writes when it does its final flush). > > Not what I meant, but also a good point, thanks :) > >>> Do we want to keep that in consideration? >> >> Ideas on how best to add what I mentioned above into the specification? > > Perhaps clarify that the "fast zero" flag is meant to *improve* > performance, and that it therefore should either be implemented in a way > that does in fact improve performance, or not at all? > Here's the wording changes I'm considering (ragged lines to minimize churn; I can reflow the existing paragraph if we like it): diff --git i/doc/proto.md w/doc/proto.md index 914910f..b98a455 100644 --- i/doc/proto.md +++ w/doc/proto.md @@ -2054,7 +2054,7 @@ The following request types exist: `NBD_ENOTSUP` unless the request can be serviced in less time than a corresponding `NBD_CMD_WRITE`, and SHOULD NOT alter the contents of the export when returning this failure. The server's - determination of a fast request MAY depend on a number of factors, + determination on whether to fail a fast request MAY depend on a number of factors, such as whether the request was suitably aligned, on whether the `NBD_CMD_FLAG_NO_HOLE` flag was present, or even on whether a previous `NBD_CMD_TRIM` had been performed on the region. If the @@ -2062,12 +2062,30 @@ The following request types exist: NOT fail with `NBD_ENOTSUP`, regardless of the speed of servicing a request, and SHOULD fail with `NBD_EINVAL` if the `NBD_CMD_FLAG_FAST_ZERO` flag was set. A server MAY advertise - `NBD_FLAG_SEND_FAST_ZERO` whether or not it can perform fast - zeroing; similarly, a server SHOULD fail with `NBD_ENOTSUP` when - the flag is set if the server cannot quickly determine in advance - whether that request would have been fast, even if it turns out + `NBD_FLAG_SEND_FAST_ZERO` whether or not it will actually succeed + on a fast zero request (a fast failure of `NBD_ENOTSUP` still + counts as a fast response); similarly, a server SHOULD fail a fast + zero request with `NBD_ENOTSUP` if the server cannot quickly determine in advance + whether the request would be fast, even if it turns out that the same request without the flag would be fast after all. + One intended use of a fast zero request is optimizing the copying + of a sparse image source into the export: a client can request + fast zeroing of the entire export, and if it succeeds, follow that + with write requests to just the data portions before a single + flush of the entire image, for fewer transactions overall. On the + other hand, if the fast zero request fails, the fast failure lets + the client know that it must manually write zeroes corresponding + to the holes of the source image before a final flush, for more + transactions but with no time lost to duplicated I/O to the data + portions. Knowing this usage pattern can help decide whether a + server's implementation for writing zeroes counts as fast (for + example, a successful fast zero request may start a background + operation that would cause the next flush request to take longer, + but that is okay as long as intermediate writes before that flush + do not further lengthen the time spent on the overall sequence of + operations). + If an error occurs, the server MUST set the appropriate error code in the error field. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org
Attachment:
signature.asc
Description: OpenPGP digital signature