Re: [Nbd] Design concept for async/multithreaded nbd-server
- To: Goswin von Brederlow <goswin-v-b@...186...>, email@example.com
- Subject: Re: [Nbd] Design concept for async/multithreaded nbd-server
- From: Alex Bligh <alex@...872...>
- Date: Fri, 09 Mar 2012 12:41:56 +0000
- Message-id: <404EB86FAE00884CFDF740C0@...873...>
- Reply-to: Alex Bligh <alex@...872...>
- In-reply-to: <87ipieyyjq.fsf@...860...>
- References: <87pqcutvko.fsf@...860...> <62B93D3E62545AECF3FC70FC@...873...> <87ipieyyjq.fsf@...860...>
If the only client is linux (that's a big 'if'), or if the only specified
level of synchronous behaviour of the client is 'as per linux kernel'
(a rather smaller 'if'), then (3) is the way to go, as the linux block
ordering semantic is very simple. In essence, if multiple requests are
in flight, you can process and complete them in whatever order you want
Does that hold true with multiple clients using gfs or ocfs or similar?
Are the filesystems written in such a way to preserve that ordering
semantic with multiple clients?
So say the fs developers, yes.
Your 'risks' in (3) do not exist with a linux client because they fs
layer (of a flush/fua compliant fs) will not issue requests that would
cause data loss without waiting for a reply to their flush/fua. Broken
filing systems (e.g. ext2) are inherently unsafe anyway, as if you pull
the power cord data may be in your HD cache. Errored writes are lost.
The problem there is that the client won't be able to resume operations
safely after a crash (persist option). The reconnect done when using
-persist is transparent to the fileystem, right?
I don't see this problem as any different to a SATA drive being yanked
out and replaced. It's inherently dangerous, and there is always the
possibility of data loss, unless you ensure no write is ever returned
until you know it's on the metal.
Frankly reconnecting without signaling the block layer (and it signaling
the file system) is inherently dangerous unless the volume is read only.
I have done stats on this (admittedly with a rather different backend)
and each of your proposals 1 and 2 is significantly slower than (3).
However, I'd suggest you code whatever you are doing with a flag to
implement this stuff (I did) so you can measure performance.
Actually why would that be? The linux kernel handles multiple requests
on-the-fly and does not wait for the reply. Only a FUA/FLUSH will block.
The difference between 2 and 3 should only be how many requests are
on-the-fly, not how long the FUA/FLUSH takes. So unless the client hits
some max-requests-on-the-fly limit or runs out of memory to buffer
requests (and their data) there should be no difference in speed.
Because it's not as parallel as you think. Some stuff /is/ waiting
for replies. We see relatively little parallelism. But don't believe me,
For completeness, there is an option (4): do everything in parallel
and ignore FLUSH and FUA completely. This goes even faster, but
is clearly unsafe.
That is so unsafe I don't even consider testing that. No, if the server
enters a contract to honor FUA/FLUSH then it needs to do so.
We found it instructive to see how much FUA/FLUSH was slowing things down.