[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: [Nbd] [RFC] Proposal: NBD_CMD_READ2



Hi,

On Thu, Aug 20, 2015 at 01:45:58PM +0100, Alex Bligh wrote:
> Wouter,
> 
> On 20 Aug 2015, at 12:05, Wouter Verhelst <w@...112...> wrote:
> > 
> > One of the problems with the NBD protocol is that the read command sends
> > out the reply header before the data. As such, if handling of a read
> > request encounters a problem after the header has been sent out, there
> > is no way currently to communicate this fact to the client.
> > 
> > This is a problem, because it forces the server to choose between a
> > number of equally unattractive options:
> > - The server could ignore read errors. This would mean the client would
> >  get incorrect data.
> > - The server could drop the connection on receiving a read error. This
> >  would mean the client would see a lost connection without really
> >  knowing what's happening.
> > - The server could be required to read all data into memory before
> >  sending out the reply header. This is problematic for busy servers
> >  and/or large read requests.
> > 
> > I would therefore want to add another message to the protocol,
> > NBD_CMD_READ2. The semantics of this message would be similar to
> > NBD_CMD_READ, except that an nbd_reply structure is sent both before and
> > after the read data.
> > 
> > If the first reply has a nonzero error message, then no data is to be
> > expected by the client (this is different from the current semantics of
> > NBD_CMD_READ as described in the protocol document).
> > 
> > If the second reply has a nonzero error message, the client should
> > consider the received data to be (possibly partially) invalid.
> > 
> > The server should send "invalid request" error replies in the first
> > reply header; it should send "medium error" replies in the second.
> > 
> > Thoughts?
> 
> This is something I've been banging on about occasionally for a while.
> 
> I think it's a good idea.
> 
> However, I would suggest an amendment.
> 
> If you do a large read, and many megabytes into the read, but many megabytes
> before the end, you still have much the same problem. Such reads *do* happen
> e.g. using qemu-img convert and nbd device.
> 
> Perhaps better would be to specify that the read would be produced in blocks
> of a size defined by the server, and each block would be followed by a header
> that could contain the error. This would add a few extra bytes of overhead
> that the client could discard, and allow the server to break the reply up
> conveniently. Each block header would specify the size of the next block OR
> an error in respect of the previous block.

I like the idea of blocks. However I think it would be better to have
some kind of negotiation for the maximum block size for reads and
writes.

As the server can choose the size of a block itself I think it wouldn't
be a problem to have a header for each data block which has the
error state and so on. Then we wouldn't have this strange semantic of
reporting the error state of the data we already transmitted.

Also when using blocks with block headers an offset would be good
together with the length of the data.

However this whole block thing would probably not be backwards
compatible.

Best regards,

Markus

-- 
Pengutronix e.K.                           |                             |
Industrial Linux Solutions                 | http://www.pengutronix.de/  |
Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-0    |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |

Attachment: signature.asc
Description: Digital signature


Reply to: