Re: [PATCH v4] doc: Add alternate trim/zero limits

To: Eric Blake <eblake@redhat.com>, "nbd@other.debian.org" <nbd@other.debian.org>
Cc: Denis Lunev <den@virtuozzo.com>
Subject: Re: [PATCH v4] doc: Add alternate trim/zero limits
From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Date: Wed, 25 Dec 2019 16:23:13 +0000
Message-id: <[🔎] 08f2380d-10c6-ef3c-4361-2c484cd90c81@virtuozzo.com>
In-reply-to: <5de902a3-bbe3-4415-8238-57d0f8e75371@virtuozzo.com>
References: <20180501212242.986796-1-eblake@redhat.com> <bbf265a8-3380-e153-1fa3-3a7c9048692a@virtuozzo.com> <272b17ce-9247-e751-b85f-9eda492c3853@redhat.com> <5de902a3-bbe3-4415-8238-57d0f8e75371@virtuozzo.com>

07.09.2019 20:26, Vladimir Sementsov-Ogievskiy wrote:
> 02.10.2018 18:50, Eric Blake wrote:
>> On 10/2/18 10:31 AM, Vladimir Sementsov-Ogievskiy wrote:
>>> Hmm, Eric, what are the plans about this?
>>>
>>> Now in upstream discard, write-zeros and block-status are limited in
>>> client to 32m by default which is too slow.
>>> Can we make some minimal specification change to drop these strict
>>> limitations?
>>
>> Elsewhere in the thread, back in May:
>>
>>>>> So, now we have two very similar options, and in realization - two almost identical code paths.
>>>>> Are we going to add similar limits for BLOCK_STATUS? and for CACHE? I have an idea: what about some kind of universal option for it? Something like
>>>>>
>>>>> option INFO_CMD_SIZE: specific limits for command:
>>>>>    uint16_t command
>>>>>    uint32_t min_block
>>>>>    uint32_t max_block
>>>>>
>>>>> and server can send several such options, client can request them in some generic way. Most of semantics for these additional limits are common, so it will simplify both documentation and realization.
>>>>>
>>>>
>>>> We already document "A particular information type SHOULD only appear once for a given export unless documented otherwise." - but this would indeed be a case where we could document otherwise.
>>>>
>>>> Do you envision a server that needs to have independent limits for CACHE and/or BLOCK_STATUS beyond the usual limits for READ, where advertising alternative limits is worthwhile?  But the idea of having a single reusable info that documents per-command limits may be worth the effort, so I'll try a v5 of both this patch and of the qemu proof-of-concept implementation along those lines, so that we can compare the two approaches. 
>>
>> It's still on my todo list to propose an alternative wording along with a qemu implementation, for the v5 posting of this topic.  However, it likely won't happen until after KVM Forum later this month.
>>
> 
> Hi!
> 
> Investigating our heap of patches in virtuozzo qemu above rhel qemu, I now look at two patches which actually drop these restrictions
> in client for WRITE_ZERO, TRIM and BLOCK_STATUS. So actually we just live with a bit non-compliant client for more than year due to
> these restrictions..
> 
> I didn't answer your question about BLOCK_STATUS: yes, we need large BLOCK_STASTUS requests, as qemu-img convert does additional loop
> of block status querying before actual converting, and this loop is slowed down because of restrictions. About CACHE I'm unsure, seems
> we didn't face such problems with it.
> 
> Do you have plans or ideas on this topic?
> 
> I think that for BLOCK_STATUS and TRIM we can safely drop max_block restriction at all, documenting that max_block is unrelated to
> these commands, as actually, for BLOCK_STATUS server may return less then required anyway, and TRIM should never lead to some big
> allocations or calculations..
> 
> WRITE_ZERO is a bit trickier, as if it is backed by just writing zeroes, but we can at least drop max_block restriction if FAST_ZERO
> flag is specified, than client may implement write zero as
> 
> write_zero(FAST_ZERO)
> if failed:
>     writing zero is slow anyway, do it in a loop.
> 
> 
> So, in other words, can we do something like this:
> 
>   diff --git a/doc/proto.md b/doc/proto.md
>   index fc7baf6..4b067f5 100644
>   --- a/doc/proto.md
>   +++ b/doc/proto.md
>   @@ -815,9 +815,12 @@ Where a transmission request can have a nonzero *offset* and/or
>    the client MUST ensure that *offset* and *length* are integer
>    multiples of any advertised minimum block size, and SHOULD use integer
>    multiples of any advertised preferred block size where possible.  For
>   -those requests, the client MUST NOT use a *length* larger than any
>   -advertised maximum block size or which, when added to *offset*, would
>   -exceed the export size.  The server SHOULD report an `NBD_EINVAL` error if
>   +those requests, the client MUST NOT use a *length* which, when added to
>   +*offset*, would exceed the export size. Also for NBD_CMD_READ,
>   +NBD_CMD_WRITE, NBD_CMD_CACHE and NBD_CMD_WRITE_ZEROES (except for
>   +when NBD_CMD_FLAG_FAST_ZERO is set), the client MUST NOT use a *length*
>   +larger than any advertised maximum block size.
>   +The server SHOULD report an `NBD_EINVAL` error if
>    the client's request is not aligned to advertised minimum block size
>    boundaries, or is larger than the advertised maximum block size.
>    Notwithstanding any maximum block size advertised, either the server
> 
> ?
> 
> Or it will make existent nbd servers non-compliant? At least seems qemu nbd server never forced these restrictions
> in specified cases.
> 
> 
> And, additional idea: can we in qemu just ignore these restrictions up to first EINVAL returned? So, for example,
> we start with bs->bl.max_pwrite_zeroes = INT_MAX, we send WRITE_ZEROES with length exceeding max_block, if server
> replies with EINVAL we retry current request using limited length (we have to do it in a loop) and set
> bs->bl.max_pwrite_zeroes = max_block.
> 
> 

Eric? Now, I'm investigating the heap again, and remember of this talk:) What do you think?

-- 
Best regards,
Vladimir

Reply to:

Prev by Date: Re: [PATCH] nbd: fix shutdown and recv work deadlock v2
Next by Date: Re: Selecting meta context when calling NBD_CMD_BLOCK_STATUS
Previous by thread: Re: [PATCH] nbd: fix shutdown and recv work deadlock v2
Next by thread: Selecting meta context when calling NBD_CMD_BLOCK_STATUS
Index(es):
- Date
- Thread