Re: [Libguestfs] [libnbd PATCH v3 03/22] protocol: Add definitions for extended headers
- To: Eric Blake <eblake@redhat.com>
- Cc: Wouter Verhelst <w@uter.be>, libguestfs@redhat.com, qemu-block@nongnu.org, nbd@other.debian.org
- Subject: Re: [Libguestfs] [libnbd PATCH v3 03/22] protocol: Add definitions for extended headers
- From: Laszlo Ersek <lersek@redhat.com>
- Date: Thu, 1 Jun 2023 05:33:20 +0200
- Message-id: <[🔎] f640bebb-28fc-24ae-5fc6-8d474f719b34@redhat.com>
- In-reply-to: <gdrdshhhjqzmhwdwvum6vahnex4d4ei5rgvxxucbwwrwidvmuy@zg2ceuzzqmah>
- References: <20230525130108.757242-1-eblake@redhat.com> <20230525130108.757242-4-eblake@redhat.com> <2b98a2ca-62d5-c87b-2a37-1a49af89b4b4@redhat.com> <ZHYOgQAL3ELxr1S9@pc220518.home.grep.be> <7f186cd0-b42e-7a20-2946-39ffecd23383@redhat.com> <5w3fbetyz62qb7rdiqu5xxpfbdhezlmkk24nvuxe6p4sem2j4w@c5lxwuc5yukh> <3dba1488-9b41-bd30-dd9d-f8b0402769a1@redhat.com> <gdrdshhhjqzmhwdwvum6vahnex4d4ei5rgvxxucbwwrwidvmuy@zg2ceuzzqmah>
On 5/31/23 18:04, Eric Blake wrote:
> On Wed, May 31, 2023 at 01:29:30PM +0200, Laszlo Ersek wrote:
>>>> Putting aside alignment even, I don't understand why reducing "count" to
>>>> uint16_t would be reasonable. With the current 32-bit-only block
>>>> descriptor, we already need to write loops in libnbd clients, because we
>>>> can't cover the entire remote image in one API call [*]. If I understood
>>>> Eric right earlier, the 64-bit extensions were supposed to remedy that
>>>> -- but as it stands, clients will still need loops ("chunking") around
>>>> block status fetching; is that right?
>>>
>>> While the larger extents reduce the need for looping, it does not
>>> entirely eliminate it. For example, just because the server can now
>>> tell you that an image is entirely data in just one reply does not
>>> mean that it will actually do so - qemu in particular limits block
>>> status of a qcow2 file to reporting just one cluster at a time for
>>> consistency reasons, where even if you use the maximum size of 2M
>>> clusters, you can never get more than (2M/16)*2M = 256G status
>>> reported in a single request.
>>
>> I don't understand the calculation. I can imagine the following
>> interpretation:
>>
>> - QEMU never sends more than 128K block descriptors, and each descriptor
>> covers one 2MB sized cluster --> 256 GB of the disk covered in one go.
>>
>> But I don't understand where the (2M/16) division comes from, even
>> though the quotient is 128K.
>
> Ah, I need to provide more backstory on the qcow2 format. A qcow2
> image has a fixed cluster size, chosen between between 512 and 2M
> bytes. A smaller cluster size has less wasted space for small images,
> but uses more overhead. Each cluster has to be stored in an L1 map,
> where pages of the map are also a cluster in length, with 16 bytes per
> map entry. So if you pick a cluster size of 512, you get 512/16 or 32
> entries per L1 page; if you pick a cluster size of 2M, you get 2M/16
> or 128k entries per L1 page. When reporting block status, qemu reads
> at most one L1 page to then say how each cluster referenced from that
> page is mapped.
>
> https://gitlab.com/qemu-project/qemu/-/blob/master/docs/interop/qcow2.txt#L491
>
>>
>> I can connect the constant "128K", and
>> <https://github.com/NetworkBlockDevice/nbd/commit/926a51df>, to your
>> paragraph [*] above, but not the division.
>
> In this case, the qemu limit on reporting block status of at most one
> L1 map page at a time happens to have no relationship to the NBD
> constant of limiting block status reports to no more than 1M extents
> (8M bytes) in a single reply, nor the fact that qemu picked a cap of
> 1M bytes (128k extents) on its NBD reply regardless of whether the
> underlying image is qcow2 or some other format.
Thanks!
[...]
Laszlo
Reply to: