Re: [PATCH v4] doc: Add alternate trim/zero limits

To: Eric Blake <eblake@redhat.com>, "nbd@other.debian.org" <nbd@other.debian.org>
Cc: Denis Lunev <den@virtuozzo.com>
Subject: Re: [PATCH v4] doc: Add alternate trim/zero limits
From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Date: Mon, 3 Feb 2020 22:18:43 +0300
Message-id: <[🔎] e4ffe034-b2fa-f7e6-aa5c-e7db55f546b6@virtuozzo.com>
In-reply-to: <08f2380d-10c6-ef3c-4361-2c484cd90c81@virtuozzo.com>
References: <20180501212242.986796-1-eblake@redhat.com> <bbf265a8-3380-e153-1fa3-3a7c9048692a@virtuozzo.com> <272b17ce-9247-e751-b85f-9eda492c3853@redhat.com> <5de902a3-bbe3-4415-8238-57d0f8e75371@virtuozzo.com> <08f2380d-10c6-ef3c-4361-2c484cd90c81@virtuozzo.com>

25.12.2019 19:23, Vladimir Sementsov-Ogievskiy wrote:

07.09.2019 20:26, Vladimir Sementsov-Ogievskiy wrote:

02.10.2018 18:50, Eric Blake wrote:

On 10/2/18 10:31 AM, Vladimir Sementsov-Ogievskiy wrote:

Hmm, Eric, what are the plans about this?

Now in upstream discard, write-zeros and block-status are limited in
client to 32m by default which is too slow.
Can we make some minimal specification change to drop these strict
limitations?


Elsewhere in the thread, back in May:

So, now we have two very similar options, and in realization - two almost identical code paths.
Are we going to add similar limits for BLOCK_STATUS? and for CACHE? I have an idea: what about some kind of universal option for it? Something like

option INFO_CMD_SIZE: specific limits for command:
   uint16_t command
   uint32_t min_block
   uint32_t max_block

and server can send several such options, client can request them in some generic way. Most of semantics for these additional limits are common, so it will simplify both documentation and realization.
We already document "A particular information type SHOULD only appear once for a given export unless documented otherwise." - but this would indeed be a case where we could document otherwise.
Do you envision a server that needs to have independent limits for CACHE and/or BLOCK_STATUS beyond the usual limits for READ, where advertising alternative limits is worthwhile? But the idea of having a single reusable info that documents per-command limits may be worth the effort, so I'll try a v5 of both this patch and of the qemu proof-of-concept implementation along those lines, so that we can compare the two approaches.


It's still on my todo list to propose an alternative wording along with a qemu implementation, for the v5 posting of this topic.  However, it likely won't happen until after KVM Forum later this month.


Hi!

Investigating our heap of patches in virtuozzo qemu above rhel qemu, I now look at two patches which actually drop these restrictions
in client for WRITE_ZERO, TRIM and BLOCK_STATUS. So actually we just live with a bit non-compliant client for more than year due to
these restrictions..

I didn't answer your question about BLOCK_STATUS: yes, we need large BLOCK_STASTUS requests, as qemu-img convert does additional loop
of block status querying before actual converting, and this loop is slowed down because of restrictions. About CACHE I'm unsure, seems
we didn't face such problems with it.

Do you have plans or ideas on this topic?

I think that for BLOCK_STATUS and TRIM we can safely drop max_block restriction at all, documenting that max_block is unrelated to
these commands, as actually, for BLOCK_STATUS server may return less then required anyway, and TRIM should never lead to some big
allocations or calculations..

WRITE_ZERO is a bit trickier, as if it is backed by just writing zeroes, but we can at least drop max_block restriction if FAST_ZERO
flag is specified, than client may implement write zero as

write_zero(FAST_ZERO)
if failed:
    writing zero is slow anyway, do it in a loop.


So, in other words, can we do something like this:

  diff --git a/doc/proto.md b/doc/proto.md
  index fc7baf6..4b067f5 100644
  --- a/doc/proto.md
  +++ b/doc/proto.md
  @@ -815,9 +815,12 @@ Where a transmission request can have a nonzero *offset* and/or
   the client MUST ensure that *offset* and *length* are integer
   multiples of any advertised minimum block size, and SHOULD use integer
   multiples of any advertised preferred block size where possible.  For
  -those requests, the client MUST NOT use a *length* larger than any
  -advertised maximum block size or which, when added to *offset*, would
  -exceed the export size.  The server SHOULD report an `NBD_EINVAL` error if
  +those requests, the client MUST NOT use a *length* which, when added to
  +*offset*, would exceed the export size. Also for NBD_CMD_READ,
  +NBD_CMD_WRITE, NBD_CMD_CACHE and NBD_CMD_WRITE_ZEROES (except for
  +when NBD_CMD_FLAG_FAST_ZERO is set), the client MUST NOT use a *length*
  +larger than any advertised maximum block size.
  +The server SHOULD report an `NBD_EINVAL` error if
   the client's request is not aligned to advertised minimum block size
   boundaries, or is larger than the advertised maximum block size.
   Notwithstanding any maximum block size advertised, either the server

?

Or it will make existent nbd servers non-compliant? At least seems qemu nbd server never forced these restrictions
in specified cases.


And, additional idea: can we in qemu just ignore these restrictions up to first EINVAL returned? So, for example,
we start with bs->bl.max_pwrite_zeroes = INT_MAX, we send WRITE_ZEROES with length exceeding max_block, if server
replies with EINVAL we retry current request using limited length (we have to do it in a loop) and set
bs->bl.max_pwrite_zeroes = max_block.


Eric? Now, I'm investigating the heap again, and remember of this talk:) What do you think?


Any ideas?

--
Best regards,
Vladimir

Reply to:

Follow-Ups:
- Re: [PATCH v4] doc: Add alternate trim/zero limits
  - From: Eric Blake <eblake@redhat.com>

Prev by Date: Re: [PATCH v4] doc: Add alternate trim/zero limits
Next by Date: Re: [v2] nbd: add a flush_workqueue in nbd_start_device
Next by thread: Re: [PATCH v4] doc: Add alternate trim/zero limits
Index(es):
- Date
- Thread