[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: [Nbd] [PATCH] docs/proto: clarify NBD_CMD_FLUSH




On 14/05/2015 12:33, Wouter Verhelst wrote:
> On Fri, May 08, 2015 at 12:16:16PM +0200, Paolo Bonzini wrote:
> > There are two problems:
> > 
> > 1) A literal reading of the specification could imply that the server could
> > not send a reply if fsync() fails, because in that case previous writes
> > have not reached the disk.  Of course, this part of the specification only
> > applies to successful replies.
> > 
> > 2) Flush does not apply to outstanding writes.  It applies to _completed_
> > writes, ensuring that they also hit the disk.
>
> Er, I always thought it was supposed to imply ordering as well. If you
> send write request A, then write request B, then receive a reply message
> for B, and then (before receiving reply for A) send a flush request,
> the flush reply message should not be sent before A *and* B are
> finished; that was my understanding.

I think the best thing to do here is to defer to what the SCSI standard
does.  My reading of the SCSI standard is that:

- writes are not guaranteed to have reached volatile cache until they
return.

- flushes are described simply as ensuring that the volatile cache has
been written to the medium.

And for the implementation we can look at what the Linux kernel SCSI
target does:

- writes are vfs_write

- syncs are vfs_sync_range

In fact, QEMU also uses these semantics for all its backends and device
models, as well as for the qemu-nbd server (which supports asynchronous
I/O).  It predates me, but there was a lot of research on the semantics
to use so that virtual machines could safely flush data to disk.

Paolo

> Of course, this nbd server doesn't actually care either way, because it
> "just" does fsync(), without disordering writes. So I suppose what
> matters is the Linux kernel's semantics in this regard for flush
> messages; does it care? If not, an asynchronous implementation of
> nbd-server would be much simpler (because no synchronisation is
> necessary); but if it does, we shouldn't change that part of the
> protocol--and if we're not sure we should probably not change it either,
> just to be on the safe side.



Reply to: