Re: [Libguestfs] [PATCH v3 6/6] RFC: spec: Introduce NBD_REPLY_TYPE_OFFSET_HOLE_EXT

To: nbd@other.debian.org, "Richard W.M. Jones" <rjones@redhat.com>
Cc: Eric Blake <eblake@redhat.com>, libguestfs@redhat.com, qemu-block@nongnu.org
Subject: Re: [Libguestfs] [PATCH v3 6/6] RFC: spec: Introduce NBD_REPLY_TYPE_OFFSET_HOLE_EXT
From: Wouter Verhelst <w@uter.be>
Date: Wed, 19 Apr 2023 07:03:56 +0200
Message-id: <[🔎] 626173F3-DE35-4B40-BE48-2633E8EF1E6F@uter.be>
In-reply-to: <[🔎] 20230418141311.GS7773@redhat.com>
References: <[🔎] 20230413220241.1396012-1-eblake@redhat.com> <[🔎] 20230413220241.1396012-7-eblake@redhat.com> <[🔎] ZD6OJXoQBNJH5p4W@pc220518.home.grep.be> <[🔎] 20230418141311.GS7773@redhat.com>

Apologies; I somehow misread Eric's mail into thinking that the implementation wasn't ready yet. Not sure what happened there.

If there is an implementation (and clearly there is a need) then I have no objection to merging this on master.

Reviewed-By: Wouter Verhelst <w@uter.be>

"Richard W.M. Jones" <rjones@redhat.com> schreef op 18 april 2023 16:13:11 CEST:
>On Tue, Apr 18, 2023 at 02:33:41PM +0200, Wouter Verhelst wrote:
>> On Thu, Apr 13, 2023 at 05:02:41PM -0500, Eric Blake wrote:
>> > Rather than requiring all servers and clients to have a 32-bit limit
>> > on maximum NBD_CMD_READ/WRITE sizes, we can choose to standardize
>> > support for a 64-bit single I/O transaction now.
>> > NBD_REPLY_TYPE_OFFSET_DATA can already handle a large reply, but
>> > NBD_REPLY_TYPE_OFFSET_HOLE needs a 64-bit counterpart.
>> > 
>> > By standardizing this, all clients must be prepared to support both
>> > types of hole type replies, even though most server implementations of
>> > extended replies are likely to only send one hole type.
>> 
>> I think it's probably a better idea to push this patch to a separate
>> "extension-*" branch, and link to that from proto.md on master. Those
>> are documented as "we standardized this, but no first implementor exists
>> yet".
>> 
>> If someone actually comes up with a reason for 64-bit transactions, we
>> can then see if the spec matches the need and merge it to master.
>> 
>> Otherwise this feels too much like a solution in search of a problem to
>> me.
>
>I'd like to push back a bit on this.  Firstly Eric does have two
>complete implementations.  It's true however that they not upstream in
>either case.
>
>But we also need this because there are relatively serious issues
>observed, particularly around trimming/zeroing, and extents.  The
>trimming problem can be demonstrated very easily in fact:
>
>    $ nbdkit -U - --filter=stats memory 1P statsfile=/dev/stdout --run ' time guestfish add "" protocol:nbd server:unix:$unixsocket discard:enable format:raw : run : mkfs xfs /dev/sda '
>    
>    real	 4m17.531s
>    user	 0m0.032s
>    sys	 0m0.040s
>    total: 1066328 ops, 257.558068 s, 1048578.04 GiB, 4071.23 GiB/s
>    read: 4356 ops, 0.003335 s, 14.61 MiB, 4.28 GiB/s op, 58.08 KiB/s total
>      Request size and alignment breakdown:
>        12 bits: 50.8% (2215 reqs, 8.65 MiB total)
>             12 bit aligned: 100.0% (2215)
>             13 bit aligned:  51.6% (1143)
>             14 bit aligned:  26.9% (595)
>             15 bit aligned:  14.6% (323)
>             16 bit aligned:   8.4% (185)
>             17+ bit-aligned:  4.9% (109)
>         9 bits: 47.4% (2064 reqs, 1.01 MiB total)
>              9 bit aligned: 100.0% (2064)
>             10+ bit-aligned:  0.6% (13)
>        other sizes:  1.8% (77 reqs, 14.61 MiB total)
>    
>    write: 13325 ops, 0.046782 s, 31.29 MiB, 668.91 MiB/s op, 124.41 KiB/s total
>      Request size and alignment breakdown:
>        12 bits: 53.8% (7170 reqs, 28.01 MiB total)
>             12 bit aligned: 100.0% (7170)
>             13 bit aligned:  50.0% (3585)
>             14 bit aligned:  25.0% (1793)
>             15 bit aligned:  12.5% (896)
>             16 bit aligned:   6.2% (448)
>             17+ bit-aligned:  3.1% (224)
>         9 bits: 46.2% (6150 reqs, 3.00 MiB total)
>              9 bit aligned: 100.0% (6150)
>             10 bit aligned:  33.4% (2054)
>             12 bit aligned:  16.7% (1029)
>             13 bit aligned:   8.4% (515)
>             14+ bit-aligned:  4.2% (259)
>        other sizes:  0.0% (5 reqs, 31.29 MiB total)
>    
>    trim: 1048576 ops, 306.059735 s, 1048576.00 GiB, 3426.05 GiB/s op, 4071.22 GiB/s total
>      Request size and alignment breakdown:
>        30 bits: 100.0% (1048576 reqs, 1048576.00 GiB total)
>             30 bit aligned: 100.0% (1048576)
>             31 bit aligned:  50.0% (524288)
>             32 bit aligned:  25.0% (262144)
>             33 bit aligned:  12.5% (131072)
>             34 bit aligned:   6.2% (65536)
>             35+ bit-aligned:  3.1% (32768)
>    
>    zero: 64 ops, 0.003912 s, 1.99 GiB, 508.75 GiB/s op, 7.91 MiB/s total
>      Request size and alignment breakdown:
>        25 bits: 98.4% (63 reqs, 1.97 GiB total)
>             13 bit aligned: 100.0% (63)
>        other sizes:  1.6% (1 reqs, 1.99 GiB total)
>    
>    flush: 7 ops, 0.000001 s, 0 bytes, 0 bytes/s op, 0 bytes/s total
>
>Note how trim takes a million operations and most of the time.  That
>should be done in one operation.  If you stop advertising discard
>support on the disk ("discard:disable") it takes only a fraction of
>the time.
>
>The extents one is harder to demonstrate, but it makes our code
>considerably more complex that we cannot just grab the extent map for
>a whole disk larger than 4GB in a single command.  (The complexity
>won't go away, but the querying will be faster with fewer round trips
>with this change.)
>
>Nevertheless I'm not opposed to keeping this as an extension until the
>implementations are upstream and bedded in.
>
>Rich.
>
>
>> With that said, for the things I didn't reply to, you can add:
>> 
>> Reviewed-By: Wouter Verhelst <w@uter.be>
>> 
>> -- 
>>      w@uter.{be,co.za}
>> wouter@{grep.be,fosdem.org,debian.org}
>> 
>> I will have a Tin-Actinium-Potassium mixture, thanks.
>> 
>> _______________________________________________
>> Libguestfs mailing list
>> Libguestfs@redhat.com
>> https://listman.redhat.com/mailman/listinfo/libguestfs
>

-- 
Verstuurd vanaf mijn Android apparaat met K-9 Mail. Excuseer mijn beknoptheid.

Reply to:

Follow-Ups:
- Re: [Libguestfs] [PATCH v3 6/6] RFC: spec: Introduce NBD_REPLY_TYPE_OFFSET_HOLE_EXT
  - From: Eric Blake <eblake@redhat.com>

References:
- [PATCH v3 0/6] NBD 64-bit extensions (spec only)
  - From: Eric Blake <eblake@redhat.com>
- [PATCH v3 6/6] RFC: spec: Introduce NBD_REPLY_TYPE_OFFSET_HOLE_EXT
  - From: Eric Blake <eblake@redhat.com>
- Re: [PATCH v3 6/6] RFC: spec: Introduce NBD_REPLY_TYPE_OFFSET_HOLE_EXT
  - From: Wouter Verhelst <w@uter.be>
- Re: [Libguestfs] [PATCH v3 6/6] RFC: spec: Introduce NBD_REPLY_TYPE_OFFSET_HOLE_EXT
  - From: "Richard W.M. Jones" <rjones@redhat.com>

Prev by Date: Re: [PATCH v3 6/6] RFC: spec: Introduce NBD_REPLY_TYPE_OFFSET_HOLE_EXT
Next by Date: Re: [PATCH v3 2/6] spec: Change maximum block size to maximum payload size
Previous by thread: Re: [Libguestfs] [PATCH v3 6/6] RFC: spec: Introduce NBD_REPLY_TYPE_OFFSET_HOLE_EXT
Next by thread: Re: [Libguestfs] [PATCH v3 6/6] RFC: spec: Introduce NBD_REPLY_TYPE_OFFSET_HOLE_EXT
Index(es):
- Date
- Thread