[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: [PATCH v4] doc: Add alternate trim/zero limits



On 2/3/20 1:18 PM, Vladimir Sementsov-Ogievskiy wrote:

Investigating our heap of patches in virtuozzo qemu above rhel qemu, I now look at two patches which actually drop these restrictions in client for WRITE_ZERO, TRIM and BLOCK_STATUS. So actually we just live with a bit non-compliant client for more than year due to
these restrictions..

So far this is working well enough that my idea of an extension still hasn't percolated to the top of my todo queue; but it is is getting closer.


I didn't answer your question about BLOCK_STATUS: yes, we need large BLOCK_STASTUS requests, as qemu-img convert does additional loop of block status querying before actual converting, and this loop is slowed down because of restrictions. About CACHE I'm unsure, seems
we didn't face such problems with it.

Do you have plans or ideas on this topic?

I think that for BLOCK_STATUS and TRIM we can safely drop max_block restriction at all, documenting that max_block is unrelated to these commands, as actually, for BLOCK_STATUS server may return less then required anyway, and TRIM should never lead to some big
allocations or calculations..

WRITE_ZERO is a bit trickier, as if it is backed by just writing zeroes, but we can at least drop max_block restriction if FAST_ZERO
flag is specified, than client may implement write zero as

write_zero(FAST_ZERO)
if failed:
    writing zero is slow anyway, do it in a loop.


So, in other words, can we do something like this:

  diff --git a/doc/proto.md b/doc/proto.md
  index fc7baf6..4b067f5 100644
  --- a/doc/proto.md
  +++ b/doc/proto.md
  @@ -815,9 +815,12 @@ Where a transmission request can have a nonzero *offset* and/or
   the client MUST ensure that *offset* and *length* are integer
   multiples of any advertised minimum block size, and SHOULD use integer
   multiples of any advertised preferred block size where possible.  For
  -those requests, the client MUST NOT use a *length* larger than any
  -advertised maximum block size or which, when added to *offset*, would
  -exceed the export size.  The server SHOULD report an `NBD_EINVAL` error if   +those requests, the client MUST NOT use a *length* which, when added to
  +*offset*, would exceed the export size. Also for NBD_CMD_READ,
  +NBD_CMD_WRITE, NBD_CMD_CACHE and NBD_CMD_WRITE_ZEROES (except for
  +when NBD_CMD_FLAG_FAST_ZERO is set), the client MUST NOT use a *length*
  +larger than any advertised maximum block size.

Meanwhile, this doc tweak makes sense to me. Would you like to submit it as a proper patch against nbd.git to make it easier for me to apply the patch correctly?

  +The server SHOULD report an `NBD_EINVAL` error if
   the client's request is not aligned to advertised minimum block size
   boundaries, or is larger than the advertised maximum block size.
   Notwithstanding any maximum block size advertised, either the server

?

Or it will make existent nbd servers non-compliant? At least seems qemu nbd server never forced these restrictions
in specified cases.


And, additional idea: can we in qemu just ignore these restrictions up to first EINVAL returned? So, for example, we start with bs->bl.max_pwrite_zeroes = INT_MAX, we send WRITE_ZEROES with length exceeding max_block, if server replies with EINVAL we retry current request using limited length (we have to do it in a loop) and set
bs->bl.max_pwrite_zeroes = max_block.



Eric? Now, I'm investigating the heap again, and remember of this talk:) What do you think?


Any ideas?


I still hope to revisit my idea of extending NBD_INFO during NBD_OPT_GO to expose actual server limits for trim, write zeroes, and block status. But I'm also about to post a different extension adding NBD_INFO_INIT_STATE which would let a server advertise to the client when it is already known that the export reads as all zeroes, so you don't even have to TRY to use large trim or write zero requests, nor iterate over the image with block status, but can immediately proceed straight to writing just the non-zero portions of the export.


--
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org


Reply to: