[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: [Nbd] Design concept for async/multithreaded nbd-server



Alex Bligh <alex@...872...> writes:

> Goswin,
>
>>> If the only client is linux (that's a big 'if'), or if the only specified
>>> level of synchronous behaviour of the client is 'as per linux kernel'
>>> (a rather smaller 'if'), then (3) is the way to go, as the linux block
>>> ordering semantic is very simple. In essence, if multiple requests are
>>> in flight, you can process and complete them in whatever order you want
>>> to.
>>
>> Does that hold true with multiple clients using gfs or ocfs or similar?
>> Are the filesystems written in such a way to preserve that ordering
>> semantic with multiple clients?
>
> So say the fs developers, yes.

And so say we all. :)

>>> Your 'risks' in (3) do not exist with a linux client because they fs
>>> layer (of a flush/fua compliant fs) will not issue requests that would
>>> cause data loss without waiting for a reply to their flush/fua. Broken
>>> filing systems (e.g. ext2) are inherently unsafe anyway, as if you pull
>>> the power cord data may be in your HD cache. Errored writes are lost.
>>
>> The problem there is that the client won't be able to resume operations
>> safely after a crash (persist option). The reconnect done when using
>> -persist is transparent to the fileystem, right?
>
> I don't see this problem as any different to a SATA drive being yanked
> out and replaced. It's inherently dangerous, and there is always the
> possibility of data loss, unless you ensure no write is ever returned
> until you know it's on the metal.
>
> Frankly reconnecting without signaling the block layer (and it signaling
> the file system) is inherently dangerous unless the volume is read only.

Not so with modes (1) or (2). At least I do hope the linux kernel
resends any request that wasn't ACKed after a reconnect. That is just an
assumption though, haven't tested or checked the code for that (yet).

If it doesn't resend then -presist is completly unusable imho.

>>> I have done stats on this (admittedly with a rather different backend)
>>> and each of your proposals 1 and 2 is significantly slower than (3).
>>> However, I'd suggest you code whatever you are doing with a flag to
>>> implement this stuff (I did) so you can measure performance.
>>
>> Actually why would that be? The linux kernel handles multiple requests
>> on-the-fly and does not wait for the reply. Only a FUA/FLUSH will block.
>> The difference between 2 and 3 should only be how many requests are
>> on-the-fly, not how long the FUA/FLUSH takes. So unless the client hits
>> some max-requests-on-the-fly limit or runs out of memory to buffer
>> requests (and their data) there should be no difference in speed.
>> Theoretically. :)
>
> Because it's not as parallel as you think. Some stuff /is/ waiting
> for replies. We see relatively little parallelism. But don't believe me,
> go test!

Well, if it waits then I would say it generally should be sending out a
FUA or FLUSH to hurry it along. Maybe I should add a flag so that write
requests are only replied to if they have FUA set or a FLUSH was
recieved. That way anything waiting for a reply without FUA/FLUSH would
simply hang and should be easy to trace. :)

MfG
        Goswin






Reply to: