Re: [Libguestfs] nbdcpy: from scratch nbdcopy using io_uring

To: Abhay Raj Singh <rathod.sahaab@gmail.com>
Cc: "Richard W.M. Jones" <rjones@redhat.com>, libguestfs@redhat.com, nbd@other.debian.org
Subject: Re: [Libguestfs] nbdcpy: from scratch nbdcopy using io_uring
From: Eric Blake <eblake@redhat.com>
Date: Mon, 23 Aug 2021 12:20:26 -0500
Message-id: <[🔎] 20210823172026.dxunchzbofjg27wk@redhat.com>
In-reply-to: <CAAXt=1AJdzeMCdnBo4fqPQjetKJN-73CS7_Ln60Axr3EAfn3+w@mail.gmail.com>
References: <20210624182714.GG30099@redhat.com> <CAAXt=1C9tBwc6wkrtFz=Fdgp0CccMSvj-BGVbncXJgLXWRwbjA@mail.gmail.com> <20210625085904.GI26415@redhat.com> <CAAXt=1AU0QH=PZ+Bx=_1=GZ7rTDMfr0e4M53rHnyw=mV2o_rQg@mail.gmail.com> <20210710075756.GX26415@redhat.com> <CAAXt=1CK3LHrsmC_nK8yqqufYBZiGgcjVXOakDjVULH+QhLa4A@mail.gmail.com> <20210731183900.GU26415@redhat.com> <CAAXt=1AHofuOUHLjpJ8BkfkXp2qxZWtwP76ireaXvg_gu-By_Q@mail.gmail.com> <20210807180805.GO26415@redhat.com> <CAAXt=1AJdzeMCdnBo4fqPQjetKJN-73CS7_Ln60Axr3EAfn3+w@mail.gmail.com>

[adding the NBD list into cc]

On Mon, Aug 23, 2021 at 09:26:34PM +0530, Abhay Raj Singh wrote:
> I had an idea for optimizing my current approach, it's good in some
> ways but can be faster with some breaking changes to the protocol.
> 
> Currently, we read (from socket connected to source) one request at a time
> the simple flow looks like `read_header(io_uring) ---- success --->
> recv(data) --- success ---> send(data) & queue another read header`
> but it's not as efficient as it could be at best it's a hack.
> 
> Another approach I am thinking about is a large buffer
> where we can read all of the socket's data and process packets from
> that buffer as all the I/O is handled.
> this minimizes the number of read requests to the kernel as we do 1
> read for multiple NBD packets.
> 
> Further optimization requires changing the NBD protocol a bit
> Current protocol
> 1. Memory representation of a response (20-byte header + data)
> 2. Memory representation of a request (28-byte header + data)
> 
> HHHHH_DDDDDDDDD...
> HHHHHHH_DDDDDDDDD...
> 
> H and D represent 4 bytes, _ represents 0 bytes

You are correct that requests are currently 28 bytes header plus any
payload (where payload is currently only in NBD_CMD_WRITE).  But
responses are two different lengths: simple responses are 16 bytes +
payload (payload only for NBD_CMD_READ, and only if structured replies
not negotiated), while structured responses are 20 bytes + payload
(but while NBD_CMD_READ and NBD_CMD_BLOCK_STATUS require structured
replies, a compliant server can still send simple replies to other
commands).  So it's even trickier than you represent here, as reading
20-byte headers of a reply is not going to always do the right thing.

> 
> With the large buffer approach, we read data into a large buffer, then
> copy the NBD packet's data to a new buffer, strap a new header to it
> and send it.
> This copying is what we wanted to avoid in the first place.
> 
> If the response header was 28 bytes or the first 8-bytes of data were
> useless we could have just overwritten the header part and sent data
> directly from the large buffer, therefore avoiding the copy.
> 
> What are your thoughts?

There's already discussions about what it would take to extend the NBD
protocol to support 64-bit requests (not that we'd want to go beyond
current server restrictions of 32M or 64M maximum NBD_CMD_READ and
NBD_CMD_WRITE, but more so we can permit quick image zeroing via a
64-bit NBD_CMD_WRITE_ZEROES).  Your observation that having the
request and response headers be equally sized for more efficient
handling is worthwhile to consider in making such a protocol extension
- of necessity, it would have to be via an NBD_OPT_* option requested
by the client during negotiation and responded to affirmatively by the
server, before both sides then use the new-size packets in both
directions after NBD_OPT_GO (and a client would still have to be
prepared to fall back to the unequal-sized headers if the server
doesn't understand the option).

For that matter, is there a benefit to having cache-line-optimized
sizing, where all headers are exactly 32 bytes (both requests and
responses, and both simple and structured replies)?  I'm thinking
maybe NBD_OPT_FIXED_SIZE_HEADER might be a sane name for such an
option.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

Reply to:

Follow-Ups:
- Re: [Libguestfs] nbdcpy: from scratch nbdcopy using io_uring
  - From: Abhay Raj Singh <rathod.sahaab@gmail.com>

Prev by Date: Re: WARNING: possible circular locking dependency detected in nbd
Next by Date: Re: [PATCH 01/10] scsi/sd: use blk_cleanup_queue() insted of put_disk()
Previous by thread: Re: WARNING: possible circular locking dependency detected in nbd
Next by thread: Re: [Libguestfs] nbdcpy: from scratch nbdcopy using io_uring
Index(es):
- Date
- Thread