Re: [Qemu-devel] [PATCH 1/1] nbd: increase maximum size of the PWRITE_ZERO request
- To: Eric Blake <email@example.com>
- Cc: Alex Bligh <firstname.lastname@example.org>, Edgar Kaziakhmedov <email@example.com>, firstname.lastname@example.org, Kevin Wolf <email@example.com>, nbd list <firstname.lastname@example.org>, email@example.com, "firstname.lastname@example.org" <email@example.com>, firstname.lastname@example.org, email@example.com, Paolo Bonzini <firstname.lastname@example.org>
- Subject: Re: [Qemu-devel] [PATCH 1/1] nbd: increase maximum size of the PWRITE_ZERO request
- From: Alex Bligh <email@example.com>
- Date: Sat, 10 Feb 2018 18:43:39 +0100
- Message-id: <[🔎] B7504AC7-F974-4C7A-8FAA-C4D803C941E4@alex.org.uk>
- In-reply-to: <[🔎] firstname.lastname@example.org>
- References: <email@example.com> <[🔎] firstname.lastname@example.org>
> On 8 Feb 2018, at 16:28, Eric Blake <email@example.com> wrote:
> On 02/08/2018 07:23 AM, Edgar Kaziakhmedov wrote:
>> Upstream NBD protocol implementation supports an efficient zero out
>> mechanism over the wire, along with the ability to check whether a
>> client allows using a hole.
>> Accordingly, since PWRITE_ZERO doesn't involve any payload on the wire,
>> increase a maximum size of the PWRITE_ZERO request up to 1Gb (aligned).
>> Moreover, such change will decrease the number of PWRITE_ZERO NBD commands
>> in comparison with the current 32M limit. The benefits of
>> the larger constraint can be examined in a block mirroring over NBD.
> We've got a potential problem. Unless you have out-of-band communication of the maximum NBD_CMD_WRITE_ZEROES sizing (or if the NBD protocol is enhanced to advertise that as an additional piece of block size information during NBD_OPT_GO), then a client CANNOT assume that the server will accept a request this large. We MIGHT get lucky if all existing servers that accept WRITE_ZEROES requests either act on large requests or reply with EINVAL but do not outright drop the connection (which is different from servers that DO outright drop the connection for an NBD_CMD_WRITE larger than 32M). But I don't know if that's how all servers behave, so sending a too-large WRITE_ZEROES request may have the unintended consequence of killing the connection.
> I'm adding the NBD list; perhaps before accepting this into qemu, I should revive my earlier attempt at codifying an NBD_OPT_GO info advertisement for maximum trim/zero sizing, which would let clients have a guarantee that their choice of sizing won't cause unexpected failures.
A couple of comments:
1. The length field is only 32 bits, so no writes more than 0xffffffff in length are going to work anyway :-)
2. I'm not sure the situation is as bad as you make out Eric. I think you've forgotten the NBD_OPT_INFO work and the conversation around that we had where we determined that servers not supporting NBD_OPT_INFO were already meant to support 'unlimited' size writes. Per the spec:
"If block size constraints have not been advertised or agreed on externally, then a client SHOULD assume a default minimum block size of 1, a preferred block size of 2^12 (4,096), and a maximum block size of the smaller of the export size or 0xffffffff (effectively unlimited)."
I read these to apply to all uses of 'length', but even if one argues it doesn't apply to NBD_CMD_WRITE_ZEROES because it doesn't have a payload, I think the rebuttal is that a server which supports NBD_CMD_WRITE of a given length must also support NBD_CMD_WRITE_ZEROES of that length.
So I think a reasonable logic for Qemu would be to try NBD_CMD_INFO and find the maximum write size, and if that's unsupported use 0xffffffff (capping at export size, or export size minus write offset).