Re: [Nbd] Design concept for async/multithreaded nbd-server
- To: Alex Bligh <alex@...872...>
- Cc: email@example.com
- Subject: Re: [Nbd] Design concept for async/multithreaded nbd-server
- From: Goswin von Brederlow <goswin-v-b@...186...>
- Date: Sat, 10 Mar 2012 13:57:07 +0100
- Message-id: <87zkbor4v0.fsf@...860...>
- In-reply-to: <404EB86FAE00884CFDF740C0@...873...> (Alex Bligh's message of "Fri, 09 Mar 2012 12:41:56 +0000")
- References: <87pqcutvko.fsf@...860...> <62B93D3E62545AECF3FC70FC@...873...> <87ipieyyjq.fsf@...860...> <404EB86FAE00884CFDF740C0@...873...>
Alex Bligh <alex@...872...> writes:
>>> If the only client is linux (that's a big 'if'), or if the only specified
>>> level of synchronous behaviour of the client is 'as per linux kernel'
>>> (a rather smaller 'if'), then (3) is the way to go, as the linux block
>>> ordering semantic is very simple. In essence, if multiple requests are
>>> in flight, you can process and complete them in whatever order you want
>> Does that hold true with multiple clients using gfs or ocfs or similar?
>> Are the filesystems written in such a way to preserve that ordering
>> semantic with multiple clients?
> So say the fs developers, yes.
And so say we all. :)
>>> Your 'risks' in (3) do not exist with a linux client because they fs
>>> layer (of a flush/fua compliant fs) will not issue requests that would
>>> cause data loss without waiting for a reply to their flush/fua. Broken
>>> filing systems (e.g. ext2) are inherently unsafe anyway, as if you pull
>>> the power cord data may be in your HD cache. Errored writes are lost.
>> The problem there is that the client won't be able to resume operations
>> safely after a crash (persist option). The reconnect done when using
>> -persist is transparent to the fileystem, right?
> I don't see this problem as any different to a SATA drive being yanked
> out and replaced. It's inherently dangerous, and there is always the
> possibility of data loss, unless you ensure no write is ever returned
> until you know it's on the metal.
> Frankly reconnecting without signaling the block layer (and it signaling
> the file system) is inherently dangerous unless the volume is read only.
Not so with modes (1) or (2). At least I do hope the linux kernel
resends any request that wasn't ACKed after a reconnect. That is just an
assumption though, haven't tested or checked the code for that (yet).
If it doesn't resend then -presist is completly unusable imho.
>>> I have done stats on this (admittedly with a rather different backend)
>>> and each of your proposals 1 and 2 is significantly slower than (3).
>>> However, I'd suggest you code whatever you are doing with a flag to
>>> implement this stuff (I did) so you can measure performance.
>> Actually why would that be? The linux kernel handles multiple requests
>> on-the-fly and does not wait for the reply. Only a FUA/FLUSH will block.
>> The difference between 2 and 3 should only be how many requests are
>> on-the-fly, not how long the FUA/FLUSH takes. So unless the client hits
>> some max-requests-on-the-fly limit or runs out of memory to buffer
>> requests (and their data) there should be no difference in speed.
>> Theoretically. :)
> Because it's not as parallel as you think. Some stuff /is/ waiting
> for replies. We see relatively little parallelism. But don't believe me,
> go test!
Well, if it waits then I would say it generally should be sending out a
FUA or FLUSH to hurry it along. Maybe I should add a flag so that write
requests are only replied to if they have FUA set or a FLUSH was
recieved. That way anything waiting for a reply without FUA/FLUSH would
simply hang and should be easy to trace. :)