[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: [Nbd] [RFC] Proposal: NBD_CMD_READ2



On Thu, Aug 20, 2015 at 03:44:41PM +0200, Wouter Verhelst wrote:
> On Thu, Aug 20, 2015 at 03:17:54PM +0200, Markus Pargmann wrote:
> > Hi,
> > 
> > On Thu, Aug 20, 2015 at 01:45:58PM +0100, Alex Bligh wrote:
> > > Wouter,
> > > 
> > > On 20 Aug 2015, at 12:05, Wouter Verhelst <w@...112...> wrote:
> > > > 
> > > > One of the problems with the NBD protocol is that the read command sends
> > > > out the reply header before the data. As such, if handling of a read
> > > > request encounters a problem after the header has been sent out, there
> > > > is no way currently to communicate this fact to the client.
> > > > 
> > > > This is a problem, because it forces the server to choose between a
> > > > number of equally unattractive options:
> > > > - The server could ignore read errors. This would mean the client would
> > > >  get incorrect data.
> > > > - The server could drop the connection on receiving a read error. This
> > > >  would mean the client would see a lost connection without really
> > > >  knowing what's happening.
> > > > - The server could be required to read all data into memory before
> > > >  sending out the reply header. This is problematic for busy servers
> > > >  and/or large read requests.
> > > > 
> > > > I would therefore want to add another message to the protocol,
> > > > NBD_CMD_READ2. The semantics of this message would be similar to
> > > > NBD_CMD_READ, except that an nbd_reply structure is sent both before and
> > > > after the read data.
> > > > 
> > > > If the first reply has a nonzero error message, then no data is to be
> > > > expected by the client (this is different from the current semantics of
> > > > NBD_CMD_READ as described in the protocol document).
> > > > 
> > > > If the second reply has a nonzero error message, the client should
> > > > consider the received data to be (possibly partially) invalid.
> > > > 
> > > > The server should send "invalid request" error replies in the first
> > > > reply header; it should send "medium error" replies in the second.
> > > > 
> > > > Thoughts?
> > > 
> > > This is something I've been banging on about occasionally for a while.
> > > 
> > > I think it's a good idea.
> > > 
> > > However, I would suggest an amendment.
> > > 
> > > If you do a large read, and many megabytes into the read, but many megabytes
> > > before the end, you still have much the same problem. Such reads *do* happen
> > > e.g. using qemu-img convert and nbd device.
> > > 
> > > Perhaps better would be to specify that the read would be produced in blocks
> > > of a size defined by the server, and each block would be followed by a header
> > > that could contain the error. This would add a few extra bytes of overhead
> > > that the client could discard, and allow the server to break the reply up
> > > conveniently. Each block header would specify the size of the next block OR
> > > an error in respect of the previous block.
> > 
> > I like the idea of blocks. However I think it would be better to have
> > some kind of negotiation for the maximum block size for reads and
> > writes.
> > 
> > As the server can choose the size of a block itself I think it wouldn't
> > be a problem to have a header for each data block which has the
> > error state and so on. Then we wouldn't have this strange semantic of
> > reporting the error state of the data we already transmitted.
> 
> That would make doing things like using sendfile() to send out the
> actual data impossible, as when you use sendfile(), you don't know that
> an error occurred until you've put some of the data on the wire already.
> 
> (unless I'm missing things, which of course is possible)

Ah, I see.

I still like the idea of fragmented replies. May be useful for NBD on
top of UDP at some point (no real plans right now, but who knows...
would be perfect for bootloader nbd implementations ;) ).

> 
> > Also when using blocks with block headers an offset would be good
> > together with the length of the data.
> > 
> > However this whole block thing would probably not be backwards
> > compatible.
> 
> None of this would be, hence the idea of adding a READ2 command, and
> retaining READ for backwards compatibility.

Yes, just for naming I would prefer something like READ_END instead of
READ2. I am still thinking about this and other possibilities.

Best regards,

Markus

-- 
Pengutronix e.K.                           |                             |
Industrial Linux Solutions                 | http://www.pengutronix.de/  |
Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-0    |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |

Attachment: signature.asc
Description: Digital signature


Reply to: