[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: [Nbd] Question about the expected behaviour of nbd-server for async ops



Goswin,

Isn't there a way to get the drive to tell you when it has actually
commited the data to physical storage and to flush specific
requests only?

There may be with certain drive technologies, but remember you aren't
writing to a drive. You aren't even writing to the Linux block layer.
You are writing to a VFS filesystem.

Let the client send FLUSH
requests instead. Same effect in the end.

You can't control what the client sends (save for disabling stuff

The server tells the client wether it supports FUA. If it doesn't and
the client sends one then that is a protocol violation and should
probably abort the connection.

I don't understand your point. As the server operator, you can
control whether or not you get sent FUA independently from FLUSH.
If you don't want it, don't set FUA in the config file for the disk.
It's that easy.

There are good reasons why you might want it (even if it is more
expensive than it needs be), including the fact that as I have explained
some filing systems are starting to use FUA without a flush.

Note that the option isn't even on by default!

My understanding is that a FUA request from the upper layers gets turned
into a FLUSH automatically when the driver doesn't support FUA. So if
the nbd-client doesn't enable FUA for the kernel then any FUA request
from a filesystem should send a FLUSH over the socket. Right?

Sure, and you will get an even more expensive operation as a result.
See the multiple file case. Why would you want that? (Note that if
you do want it, it's available to you by setting flush, and not
FUA in the per-disk config file).

c) Requests should be ACKed as soon as possible to minimize the delay
   until a client can savely issue a FLUSH.

That's probably true performance wise as a general point, but there is
a complexity / safety / memory use tradeoff. If you ACK every request
as soon as it comes in, you will use a lot of memory.

How do you figure that? For me a write request (all others can be freed
once they send their reply) allways uses the same amount of memory from
the time it gets read from the socket till the time it is written to
disk (cache). The memory needed doesn't change wether you ACK it once it
is read from the socket, when the write is issued or when the write
returned.

If you ACK a write request before you've written it somewhere, you
need to keep it in memory so you can write it later.
...
That should make no difference to the client. If the kernel has 1000
dirty pages it can legally send 1000 write request to the nbd-server
without waiting for a single ACK. As long as the filesystem (or whatever
uses the nbd device) doesn't run into a barrier and needs to drain its
queue (e.g. for fsync()) there is no limit on the number of in-flight
requests the kernel could have in parallel. Obviously in practice there
will be some limits on the client side regarding the amount of in-flight
requests and filesystems usualy hit a flush/fua all to quickly. The
maximum of in-flight data can probably be seen with a simple dd.

I agree that the server should have some limits on how much in-flight
data it will allow before it pauses to parse more requests. There should
probably be a config option to set this limit to prevent a client from
causing an OOM situation, say default 100MB. I don't think filesystems,
or other normal use, will hit that limit though.

No, this is something for the server to deal with, not the client. Only
the server knows whether it is running on a 512MB Intel Atom or a 128GB
multiprocessor machine. The server needs to consider whether it should
ACK write requests before dealing with them. Sometimes (for maximum speed)
it may want to ACK them immediately. Sometimes (for simplicity - see current
code) it will want to deal with them before ACKing them. Sometimes (for
memory reasons) it will want to start not ACKing them until it has room
to buffer them. Sometimes (for maximum safety) it will want not to ACK
them until they have been dealt with (current server in sync mode
for instance). It is just not true to say that requests should always
be ACK'ed as soon as possible.

--
Alex Bligh



Reply to: