Re: [Libguestfs] [PATCH v3 6/6] RFC: spec: Introduce NBD_REPLY_TYPE_OFFSET_HOLE_EXT
Apologies; I somehow misread Eric's mail into thinking that the implementation wasn't ready yet. Not sure what happened there.
If there is an implementation (and clearly there is a need) then I have no objection to merging this on master.
Reviewed-By: Wouter Verhelst <w@uter.be>
"Richard W.M. Jones" <rjones@redhat.com> schreef op 18 april 2023 16:13:11 CEST:
>On Tue, Apr 18, 2023 at 02:33:41PM +0200, Wouter Verhelst wrote:
>> On Thu, Apr 13, 2023 at 05:02:41PM -0500, Eric Blake wrote:
>> > Rather than requiring all servers and clients to have a 32-bit limit
>> > on maximum NBD_CMD_READ/WRITE sizes, we can choose to standardize
>> > support for a 64-bit single I/O transaction now.
>> > NBD_REPLY_TYPE_OFFSET_DATA can already handle a large reply, but
>> > NBD_REPLY_TYPE_OFFSET_HOLE needs a 64-bit counterpart.
>> >
>> > By standardizing this, all clients must be prepared to support both
>> > types of hole type replies, even though most server implementations of
>> > extended replies are likely to only send one hole type.
>>
>> I think it's probably a better idea to push this patch to a separate
>> "extension-*" branch, and link to that from proto.md on master. Those
>> are documented as "we standardized this, but no first implementor exists
>> yet".
>>
>> If someone actually comes up with a reason for 64-bit transactions, we
>> can then see if the spec matches the need and merge it to master.
>>
>> Otherwise this feels too much like a solution in search of a problem to
>> me.
>
>I'd like to push back a bit on this. Firstly Eric does have two
>complete implementations. It's true however that they not upstream in
>either case.
>
>But we also need this because there are relatively serious issues
>observed, particularly around trimming/zeroing, and extents. The
>trimming problem can be demonstrated very easily in fact:
>
> $ nbdkit -U - --filter=stats memory 1P statsfile=/dev/stdout --run ' time guestfish add "" protocol:nbd server:unix:$unixsocket discard:enable format:raw : run : mkfs xfs /dev/sda '
>
> real 4m17.531s
> user 0m0.032s
> sys 0m0.040s
> total: 1066328 ops, 257.558068 s, 1048578.04 GiB, 4071.23 GiB/s
> read: 4356 ops, 0.003335 s, 14.61 MiB, 4.28 GiB/s op, 58.08 KiB/s total
> Request size and alignment breakdown:
> 12 bits: 50.8% (2215 reqs, 8.65 MiB total)
> 12 bit aligned: 100.0% (2215)
> 13 bit aligned: 51.6% (1143)
> 14 bit aligned: 26.9% (595)
> 15 bit aligned: 14.6% (323)
> 16 bit aligned: 8.4% (185)
> 17+ bit-aligned: 4.9% (109)
> 9 bits: 47.4% (2064 reqs, 1.01 MiB total)
> 9 bit aligned: 100.0% (2064)
> 10+ bit-aligned: 0.6% (13)
> other sizes: 1.8% (77 reqs, 14.61 MiB total)
>
> write: 13325 ops, 0.046782 s, 31.29 MiB, 668.91 MiB/s op, 124.41 KiB/s total
> Request size and alignment breakdown:
> 12 bits: 53.8% (7170 reqs, 28.01 MiB total)
> 12 bit aligned: 100.0% (7170)
> 13 bit aligned: 50.0% (3585)
> 14 bit aligned: 25.0% (1793)
> 15 bit aligned: 12.5% (896)
> 16 bit aligned: 6.2% (448)
> 17+ bit-aligned: 3.1% (224)
> 9 bits: 46.2% (6150 reqs, 3.00 MiB total)
> 9 bit aligned: 100.0% (6150)
> 10 bit aligned: 33.4% (2054)
> 12 bit aligned: 16.7% (1029)
> 13 bit aligned: 8.4% (515)
> 14+ bit-aligned: 4.2% (259)
> other sizes: 0.0% (5 reqs, 31.29 MiB total)
>
> trim: 1048576 ops, 306.059735 s, 1048576.00 GiB, 3426.05 GiB/s op, 4071.22 GiB/s total
> Request size and alignment breakdown:
> 30 bits: 100.0% (1048576 reqs, 1048576.00 GiB total)
> 30 bit aligned: 100.0% (1048576)
> 31 bit aligned: 50.0% (524288)
> 32 bit aligned: 25.0% (262144)
> 33 bit aligned: 12.5% (131072)
> 34 bit aligned: 6.2% (65536)
> 35+ bit-aligned: 3.1% (32768)
>
> zero: 64 ops, 0.003912 s, 1.99 GiB, 508.75 GiB/s op, 7.91 MiB/s total
> Request size and alignment breakdown:
> 25 bits: 98.4% (63 reqs, 1.97 GiB total)
> 13 bit aligned: 100.0% (63)
> other sizes: 1.6% (1 reqs, 1.99 GiB total)
>
> flush: 7 ops, 0.000001 s, 0 bytes, 0 bytes/s op, 0 bytes/s total
>
>Note how trim takes a million operations and most of the time. That
>should be done in one operation. If you stop advertising discard
>support on the disk ("discard:disable") it takes only a fraction of
>the time.
>
>The extents one is harder to demonstrate, but it makes our code
>considerably more complex that we cannot just grab the extent map for
>a whole disk larger than 4GB in a single command. (The complexity
>won't go away, but the querying will be faster with fewer round trips
>with this change.)
>
>Nevertheless I'm not opposed to keeping this as an extension until the
>implementations are upstream and bedded in.
>
>Rich.
>
>
>> With that said, for the things I didn't reply to, you can add:
>>
>> Reviewed-By: Wouter Verhelst <w@uter.be>
>>
>> --
>> w@uter.{be,co.za}
>> wouter@{grep.be,fosdem.org,debian.org}
>>
>> I will have a Tin-Actinium-Potassium mixture, thanks.
>>
>> _______________________________________________
>> Libguestfs mailing list
>> Libguestfs@redhat.com
>> https://listman.redhat.com/mailman/listinfo/libguestfs
>
--
Verstuurd vanaf mijn Android apparaat met K-9 Mail. Excuseer mijn beknoptheid.
Reply to: