Re: [PATCH v4] doc: Add alternate trim/zero limits

To: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>, "nbd@other.debian.org" <nbd@other.debian.org>
Cc: Denis Lunev <den@virtuozzo.com>
Subject: Re: [PATCH v4] doc: Add alternate trim/zero limits
From: Eric Blake <eblake@redhat.com>
Date: Mon, 3 Feb 2020 13:34:56 -0600
Message-id: <[🔎] ef0bfebc-b222-8e19-8828-d1cb4d32972d@redhat.com>
In-reply-to: <[🔎] e4ffe034-b2fa-f7e6-aa5c-e7db55f546b6@virtuozzo.com>
References: <20180501212242.986796-1-eblake@redhat.com> <bbf265a8-3380-e153-1fa3-3a7c9048692a@virtuozzo.com> <272b17ce-9247-e751-b85f-9eda492c3853@redhat.com> <5de902a3-bbe3-4415-8238-57d0f8e75371@virtuozzo.com> <08f2380d-10c6-ef3c-4361-2c484cd90c81@virtuozzo.com> <[🔎] e4ffe034-b2fa-f7e6-aa5c-e7db55f546b6@virtuozzo.com>

On 2/3/20 1:18 PM, Vladimir Sementsov-Ogievskiy wrote:

Investigating our heap of patches in virtuozzo qemu above rhel qemu,I now look at two patches which actually drop these restrictionsin client for WRITE_ZERO, TRIM and BLOCK_STATUS. So actually we justlive with a bit non-compliant client for more than year due to
these restrictions..

So far this is working well enough that my idea of an extension stillhasn't percolated to the top of my todo queue; but it is is getting closer.

I didn't answer your question about BLOCK_STATUS: yes, we need largeBLOCK_STASTUS requests, as qemu-img convert does additional loopof block status querying before actual converting, and this loop isslowed down because of restrictions. About CACHE I'm unsure, seems
we didn't face such problems with it.

Do you have plans or ideas on this topic?
I think that for BLOCK_STATUS and TRIM we can safely drop max_blockrestriction at all, documenting that max_block is unrelated tothese commands, as actually, for BLOCK_STATUS server may return lessthen required anyway, and TRIM should never lead to some big
allocations or calculations..
WRITE_ZERO is a bit trickier, as if it is backed by just writingzeroes, but we can at least drop max_block restriction if FAST_ZERO
flag is specified, than client may implement write zero as

write_zero(FAST_ZERO)
if failed:
    writing zero is slow anyway, do it in a loop.


So, in other words, can we do something like this:

  diff --git a/doc/proto.md b/doc/proto.md
  index fc7baf6..4b067f5 100644
  --- a/doc/proto.md
  +++ b/doc/proto.md
@@ -815,9 +815,12 @@ Where a transmission request can have anonzero *offset* and/or
   the client MUST ensure that *offset* and *length* are integer
multiples of any advertised minimum block size, and SHOULD useinteger
   multiples of any advertised preferred block size where possible.  For
  -those requests, the client MUST NOT use a *length* larger than any
  -advertised maximum block size or which, when added to *offset*, would
-exceed the export size. The server SHOULD report an `NBD_EINVAL`error if +those requests, the client MUST NOT use a *length* which, whenadded to
  +*offset*, would exceed the export size. Also for NBD_CMD_READ,
  +NBD_CMD_WRITE, NBD_CMD_CACHE and NBD_CMD_WRITE_ZEROES (except for
+when NBD_CMD_FLAG_FAST_ZERO is set), the client MUST NOT use a*length*
  +larger than any advertised maximum block size.

Meanwhile, this doc tweak makes sense to me. Would you like to submit itas a proper patch against nbd.git to make it easier for me to apply thepatch correctly?

  +The server SHOULD report an `NBD_EINVAL` error if
   the client's request is not aligned to advertised minimum block size
   boundaries, or is larger than the advertised maximum block size.
   Notwithstanding any maximum block size advertised, either the server

?
Or it will make existent nbd servers non-compliant? At least seemsqemu nbd server never forced these restrictions
in specified cases.
And, additional idea: can we in qemu just ignore these restrictionsup to first EINVAL returned? So, for example,we start with bs->bl.max_pwrite_zeroes = INT_MAX, we sendWRITE_ZEROES with length exceeding max_block, if serverreplies with EINVAL we retry current request using limited length (wehave to do it in a loop) and set
bs->bl.max_pwrite_zeroes = max_block.
Eric? Now, I'm investigating the heap again, and remember of thistalk:) What do you think?
Any ideas?

I still hope to revisit my idea of extending NBD_INFO during NBD_OPT_GOto expose actual server limits for trim, write zeroes, and block status.But I'm also about to post a different extension addingNBD_INFO_INIT_STATE which would let a server advertise to the clientwhen it is already known that the export reads as all zeroes, so youdon't even have to TRY to use large trim or write zero requests, noriterate over the image with block status, but can immediately proceedstraight to writing just the non-zero portions of the export.



--
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

Reply to:

Follow-Ups:
- Re: [PATCH v4] doc: Add alternate trim/zero limits
  - From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

References:
- Re: [PATCH v4] doc: Add alternate trim/zero limits
  - From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

Next by Date: Re: [PATCH v4] doc: Add alternate trim/zero limits
Previous by thread: Re: [PATCH v4] doc: Add alternate trim/zero limits
Next by thread: Re: [PATCH v4] doc: Add alternate trim/zero limits
Index(es):
- Date
- Thread