[Nbd] NBD: FLUSH & FUA - userspace pull request
Wouter,
Please pull from my tree
http://git.alex.org.uk/nbd.git
from and including commit
39e2020730daefaa2414ff97795edd40ad28a967
(I think you took the others earlier)
Since the last email I've added a comment to the documentation for
NBD_COMMAND_FLUSH indicating use of non-zero offsets/lengths is
reserved per thread with Goswin von Brederlow <goswin-v-b@...186...>.
The commits are listed below.
--
Alex Bligh
commit 777461042a908d9fee5daa9a72bd9df25bc3c7cd
Author: Alex Bligh <alex@...872...>
Date: Sun May 22 11:07:12 2011 +0100
Use nbd-tester-client.tr data with conformant MBD_CMD_FLUSH
offset/length fields
commit f65d7a3b11542177b856f6a621e12d6ec072995b
Author: Alex Bligh <alex@...872...>
Date: Sun May 22 10:58:47 2011 +0100
Change documentation for NBD_CMD_FLUSH again.
Use offset, length=0 for normal flushes. Reserve other options
for later (idea from Goswin von Brederlow <goswin-v-b@...186...>)
commit 8965fe7e2d7dc7a13065c0b00ee335068e7b9752
Author: Alex Bligh <alex@...872...>
Date: Sat May 21 08:47:21 2011 +0100
Add transaction log support and integrity test
commit ef44e8ac42197b607646a711761c060f8fa159e5
Author: Alex Bligh <alex@...872...>
Date: Wed May 18 19:34:35 2011 +0100
fix documentation of NBD_CMD_FLUSH
commit 065a1b241dde2ab6fe333ccf5bc057e7e327f8d4
Author: Alex Bligh <alex@...872...>
Date: Wed May 18 19:02:48 2011 +0100
copy handle on all requests
commit 8d4b9f4e3d10fdfa319b17f385eeaae775d0071c
Author: Alex Bligh <alex@...872...>
Date: Wed May 18 17:34:09 2011 +0100
nbd-server: don't check length and offset on flush
commit 39e2020730daefaa2414ff97795edd40ad28a967
Author: Alex Bligh <alex@...872...>
Date: Tue May 17 19:35:41 2011 +0100
Implement support for flush, fua and rotational.
This commit implements support for the flush, fua, and rotational
directives
within the configuration file.
FUA means "force the current write to hit the media service" and FLUSH
means "empty the current write queue to disk". Broadly they have the
same
semantics as the linux kernel REQ_FLUSH and REQ_FUA. FUA is implemented
through sync_file_range() (or if that doesn't exist, fdatasync() on the
file handle concerned), FLUSH through fsync() on all files. FUA and
FLUSH are selected in the config file, and set new flags bits which will
cause the client to sent FUA and FLUSH requests. The way these are
implemented is further explained in doc/proto.txt.
The purpose of this is reasonably obvious: without supporting either FUA
or FLUSH (and it's relatively easy to support both), filesystems on the
client have no way to ensure the relevant sectors have hit the disk.
The patch is implement such that the default behaviour is unchanged.
Additionally, it introduces an F_ROTATIONAL flag. This will turn off
the use of QUEUE_FLAG_NONROT in the client. QUEUE_FLAG_NONROT
effectively
disables the elevator algorithm, making the algorithm merge only. That
is
unhelpful where the server does not have its own elevator algorithm
or where the client elevator algorithm is neutered (e.g. writing to a
raw partition with F_SYNC with nbd-server). It's not going to be used
often where the backing store is a file.
It also incidentally fixes a bug where F_SYNC is ignored if
F_COPYONWRITE
is set (not that this currently has much utility).
Note the following:
* The top 16 bits of the command type have been reserved
(see NBD_CMD_MASK_COMMAND) for passing flags to be attached to
commands.
NBD_CMD_FLAG_FUA is the first of these.
* simple_test has been modified so it does not stomp over nbd.conf and
pid files in the current directory.
* A new ioctl has been added which passes the export flags to the
kernel.
There is currently no support for this in the kernel (I will submit
a patch in due course)
* "make check" now performs a flush and fua test. These have been
verified
by strace to "do the right thing".
* nbd-client incorrectly shifted the flags value left by 16 after
reading it.
This caused the test of the readonly bit to always fail. I suspect
this
may have been an attempt to combine server flags and export flags into
a single 32 bit word. However, the low 16 bits were never set, and
only
the low 16 bits are tested. As the only thing it was testing for was
the
read only flag, and that is supplied in the flags, and as this allows
transparent passing to the ioctl, I suggest it is changed as per this
commit.
If there's something else to add, it can be put in the upper 16 bits.
I have tested this lightly with nbd-client (obviously the kernel does
not
yet send FLUSH or FUA) and with the test suite. It's probably ready to
go in marked "experimental".
Reply to: