[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: [Nbd] Question about the expected behaviour of nbd-server for async ops





--On 29 May 2011 20:11:51 +0200 Wouter Verhelst <w@...112...> wrote:

No, we need to define how nbd behaves. This may or may not be the same
thing as how the Linux block layer behaves, and it may change at some
point in the future if we add additional messages to the protocol.
I'm *not* going to be adding negotiation messages of the likes of "use
2.6.42-style semantics".

Your challenge, should you choose to accept it, is to define the
current requirements /without/ reference to behaviour of a specific
group of kernels...

See also my message to Goswin re us already having problems here
in theory.

I'm also not going to care about marking the particular write that
failed. If you're writing to a broken disk, your filesystem is going to
lock up anyway, and then all writes will fail. If write() returns an
error condition, I'd return that to the client. If fsync() returns an
error condition, I'd return that to the client too. Beyond that, I'm not
going to care much about assigning failures to the correct write.

I agree 100%. There is no need to care where the error occurs. Indeed
I think the kernel does a good job of losing the info anyway.

All I was saying is that if you are disordering writes, caching stuff
etc., and you get an underlying error, you might consider erroring all
subsequent writes/flushes - I'd rather tell the block layer too much
failed than too little. Writes do not have to fail atomically anyway.
A good example is as follows: suppose you operate a huge writeback cache.
The client runs ext3 without -obarriers=1, so issues requests without
FUA and issues no flushes. It then goes idle. Your watchdog timer
expires and you decide to flush some data out to disk of your
own volition. As it happens, the hard disk has died, and the fsync()
errors. You have no command to report this to the client. Are you
really going to let future writes to your huge RAM based cache
succeed, and never report the fsync() error? I'd be erroring every
write after the fsync error because I'd be saying "this nbd device is
broken - there is a high probability your writes will never reach
a disk". Given in normal operation (mount, unmount, disconnect)
you NEVER receive a REQ_FLUSH from an ext3 device with no barriers,
and you can't error the disconnect, not errorring subsequent writes
would lead the user to think his device had been successfully unmounted
and disconnected. I'd consider that bad.

--
Alex Bligh



Reply to: