Re: Cross-project NBD extension proposal: NBD_INFO_INIT_STATE

To: Eric Blake <eblake@redhat.com>, "nbd@other.debian.org" <nbd@other.debian.org>, QEMU <qemu-devel@nongnu.org>, "qemu-block@nongnu.org" <qemu-block@nongnu.org>, "libguestfs@redhat.com" <libguestfs@redhat.com>
Cc: "Richard W.M. Jones" <rjones@redhat.com>, Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>, Alberto Garcia <berto@igalia.com>
Subject: Re: Cross-project NBD extension proposal: NBD_INFO_INIT_STATE
From: Max Reitz <mreitz@redhat.com>
Date: Wed, 19 Feb 2020 12:10:10 +0100
Message-id: <[🔎] 4cd75cc7-eff2-70f0-9d07-2dbb1165afeb@redhat.com>
In-reply-to: <[🔎] c47d277e-35f8-9837-1f1d-eab4bb6d5840@redhat.com>
References: <[🔎] a4394fde-f459-dcb5-1698-013e1e24c388@redhat.com> <[🔎] 1b3741aa-7841-9062-ecca-73c38e599e05@redhat.com> <[🔎] c47d277e-35f8-9837-1f1d-eab4bb6d5840@redhat.com>

On 18.02.20 21:55, Eric Blake wrote:
> On 2/17/20 9:13 AM, Max Reitz wrote:
>> Hi,
>>
>> It’s my understanding that without some is_zero infrastructure for QEMU,
>> it’s impossible to implement this flag in qemu’s NBD server.
> 
> You're right that we may need some more infrastructure before being able
> to decide when to report this bit in all cases.  But for raw files, that
> infrastructure already exists: does block_status at offset 0 and the
> entire image as length return status that the entire file is a hole.

Hm, except that only works if the file is just completely unallocated.
Calling that existing infrastructure is a bit of a stretch, I think.

Or are you saying that bdrv_co_block_status(..., 0, bdrv_getlength(),
...) is our existing infrastructure?  Actually, why not.  Can we make
block drivers catch that special case?  (Or might the generic block code
somehow truncate such requests?)

> And
> for qcow2 files, it would not be that hard to teach a similar
> block_status request to report the entire image as a hole based on my
> proposed qcow2 autoclear bit tracking that the image still reads as zero.
> 
>>
>> At the same time, I still haven’t understood what we need the flag for.
>>
>> As far as I understood in our discussion on your qemu series, there is
>> no case where anyone would need to know whether an image is zero.  All
>> > practical cases involve someone having to ensure that some image is
>> zero.  Knowing whether an image is zero can help with that, but that can
>> be an implementation detail.
>>
>> For qcow2, the idea would be that there is some flag that remains true
>> as long as the image is guaranteed to be zero.  Then we’d have some
>> bdrv_make_zero function, and qcow2’s implementation would use this
>> information to gauge whether there’s something to do as all.
>>
>> For NBD, we cannot use this idea directly because to implement such a
>> flag (as you’re describing in this mail), we’d need separate is_zero
>> infrastructure, and that kind of makes the point of “drivers’
>> bdrv_make_zero() implementations do the right thing by themselves” moot.
> 
> We don't necessarily need a separate is_zero infrastructure if we can
> instead teach the existing block_status infrastructure to report that
> the entire image reads as zero.  You're right that clients that need to
> force an entire image to be zero won't need to directly call
> block_status (they can just call bdrv_make_zero, and let that worry
> about whether a block status call makes sense among its list of steps to
> try).  But since block_status can report all-zero status for some cases,
> it's not hard to use that for feeding the NBD bit.

OK.  I’m not 100% sure there’s nothing that would bite us in the butt
here, but I seem to remember you made all block_status things 64-bit, so
I suppose you know. :)

> However, there's a difference between qemu's block status (which is
> already typed correctly to return a 64-bit answer, even if it may need a
> few tweaks for clients that currently don't expect it to request more
> than 32 bits) and NBD's block status (which can only report 32 bits
> barring a new extension to the protocol), and where a single all-zero
> bit at NBD_OPT_GO is just as easy of an extension as a way to report a
> 64-bit all-zero response to NBD_CMD_BLOCK_STATUS.

Agreed.

>> OTOH, we wouldn’t need such a flag for the implementation, because we
>> could just send a 64-bit discard/make_zero over the whole block device
>> length to the NBD server, and then the server internally does the right
>> thing(TM).  AFAIU discard and write_zeroes currently have only 32 bit
>> length fields, but there were plans for adding support for 64 bit
>> versions anyway.  From my naïve outsider perspective, doing that doesn’t
>> seem a more complicated protocol addition than adding some way to tell
>> whether an NBD export is zero.
> 
> Adding 64-bit commands to NBD is more invasive than adding a single
> startup status bit.

True.  But if we/you want 64-bit commands anyway, then it doesn’t really
matter what’s more invasive.

> Both ideas can be done - doing one does not
> preclude the other.

Absolutely.  It’s just that if you do one anyway and it supersedes the
other, than we don’t have to do both.  Hence me wondering whether one
does supersede the other.

> But at the same time, not all servers will
> implement both ideas - if one is easy to implement while the other is
> hard, it is not unlikely that qemu will still encounter NBD servers that
> advertise startup state but not support 64-bit make_zero (even if qemu
> as NBD server starts supporting 64-bit make zero) or even 64-bit block
> status results.

Hm.  You know better than me whether that’s a good argument, because it
mostly depends on how many NBD server implementations there are;
specifically whether there are any that are decidedly not feature-complete.

> Another thing to think about here is timing.  With the proposed NBD
> addition, it is the server telling the client that "the image you are
> connecting to started zero", prior to the point that the client even has
> a chance to request "can you make the image all zero in a quick manner
> (and if not, I'll fall back to writing zeroes as I go)".  And even if
> NBD gains a 64-bit block status and/or make zero command, it is still
> less network traffic for the server to advertise up-front that the image
> is all zero than it is for the client to have to issue command requests
> of the server (network traffic is not always the bottleneck, but it can
> be a consideration).

I suppose one 64-bit discard/write_zeroes to the whole image wouldn’t be
too bad, regardless of the network speed.

>> So I’m still wondering whether there are actually cases where we need to
>> tell whether some image or NBD export is zero that do not involve making
>> it zero if it isn’t.
> 
> Just because we don't think that qemu-img has such a case does not mean
> that other NBD clients will not be able to come up with some use for
> knowing if an image starts all zero.

Sure, but implementing a feature on the basis of “Somebody may come up
with a use for it” sounds fishy to me.

OTOH, you completely convinced me with the fact that we can start by
letting qemu’s NBD server just invoke a block-status call over the whole
image, and then (potentially later) letting various block drivers
recognize that case.  But I suppose that means we no longer need a
dedicated has_zero_open() function, right?

>> (I keep asking because it seems to me that if all we ever really want to
>> do is to ensure that some images/exports are zero, we should implement
>> that.)
> 
> The problem is WHERE do you implement it.  Is it more efficient to
> implement make_zero in the NBD server (the client merely requests to
> make zero, but lets the server do all the work) or in the NBD client
> (the client learns whether the server is already zero, and not hearing
> yes, the client proceeds to do all the work to write zeroes).  From the
> qemu perspective, qemu-img convert needs the image to be zero, and
> bdrv_make_zero will report back either "yes I quickly made it zero,
> possibly by doing nothing" or "no, making it zero now is no more
> efficient than you just writing zeroes as you go".  But although the
> code in qemu-img is the same whether bdrv_make_zero is able to request
> the work be done in the server or whether the work has to be done in the
> client, the code in the block layer that implements bdrv_make_zero may
> itself care about knowing whether the NBD server started all zero.

If we have both 64-bit write_zeroes and a zero flag in the NBD protocol,
then I don’t think there’s much difference in terms of efficiency.

However, if we had to decide between which of the features is more
important for efficiency, then the difference that appears to me is that:
- With just a flag but no 64-bit write_zeroes, zeroing a non-zero image
will be inefficient.
- With just a 64-bit write_zeroes but no flag, the client has to issue a
NOP whole-image write_zeroes call for images that are zero already, but
I don’t really see that as an issue.

So *if* we had to decide, it appears to me that 64-bit write_zeroes
would be more important.

Max

Attachment: signature.asc
Description: OpenPGP digital signature

Reply to:

References:
- Cross-project NBD extension proposal: NBD_INFO_INIT_STATE
  - From: Eric Blake <eblake@redhat.com>
- Re: Cross-project NBD extension proposal: NBD_INFO_INIT_STATE
  - From: Max Reitz <mreitz@redhat.com>
- Re: Cross-project NBD extension proposal: NBD_INFO_INIT_STATE
  - From: Eric Blake <eblake@redhat.com>

Prev by Date: [PATCH 2/2] nbd: requeue command if the soecket is changed
Next by Date: Re: [PATCH 1/2] nbd: enable replace socket if only one connection is configured
Previous by thread: Re: Cross-project NBD extension proposal: NBD_INFO_INIT_STATE
Next by thread: [PATCH AUTOSEL 5.5 121/542] nbd: add a flush_workqueue in nbd_start_device
Index(es):
- Date
- Thread