[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: [Nbd] Question about the expected behaviour of nbd-server for async ops



Alex Bligh <alex@...872...> writes:

> Goswin,
>
>> Isn't there a way to get the drive to tell you when it has actually
>> commited the data to physical storage and to flush specific
>> requests only?
>
> There may be with certain drive technologies, but remember you aren't
> writing to a drive. You aren't even writing to the Linux block layer.
> You are writing to a VFS filesystem.

Nah, I was thinking of the drive and bottom IO layer of the
kernel. Passing that information / capability up to the VFS layer and
eventually user space would come later.

>>>>>> c) Requests should be ACKed as soon as possible to minimize the delay
>>>>>>    until a client can savely issue a FLUSH.
>>>>>
>>>>> That's probably true performance wise as a general point, but there is
>>>>> a complexity / safety / memory use tradeoff. If you ACK every request
>>>>> as soon as it comes in, you will use a lot of memory.
>>>>
>>>> How do you figure that? For me a write request (all others can be freed
>>>> once they send their reply) allways uses the same amount of memory from
>>>> the time it gets read from the socket till the time it is written to
>>>> disk (cache). The memory needed doesn't change wether you ACK it once it
>>>> is read from the socket, when the write is issued or when the write
>>>> returned.
>>>
>>> If you ACK a write request before you've written it somewhere, you
>>> need to keep it in memory so you can write it later.
> ...
>> That should make no difference to the client. If the kernel has 1000
>> dirty pages it can legally send 1000 write request to the nbd-server
>> without waiting for a single ACK. As long as the filesystem (or whatever
>> uses the nbd device) doesn't run into a barrier and needs to drain its
>> queue (e.g. for fsync()) there is no limit on the number of in-flight
>> requests the kernel could have in parallel. Obviously in practice there
>> will be some limits on the client side regarding the amount of in-flight
>> requests and filesystems usualy hit a flush/fua all to quickly. The
>> maximum of in-flight data can probably be seen with a simple dd.
>>
>> I agree that the server should have some limits on how much in-flight
>> data it will allow before it pauses to parse more requests. There should
>> probably be a config option to set this limit to prevent a client from
>> causing an OOM situation, say default 100MB. I don't think filesystems,
>> or other normal use, will hit that limit though.
>
> No, this is something for the server to deal with, not the client. Only

Never said the client should.

> the server knows whether it is running on a 512MB Intel Atom or a 128GB
> multiprocessor machine. The server needs to consider whether it should
> ACK write requests before dealing with them. Sometimes (for maximum speed)
> it may want to ACK them immediately. Sometimes (for simplicity - see current
> code) it will want to deal with them before ACKing them. Sometimes (for
> memory reasons) it will want to start not ACKing them until it has room
> to buffer them. Sometimes (for maximum safety) it will want not to ACK
> them until they have been dealt with (current server in sync mode
> for instance). It is just not true to say that requests should always
> be ACK'ed as soon as possible.

I disagree. Delaying ACKs is not the way to slow down the client when
the server has too much cached data for the moment. It should just pause
reading in more requests till it has buffer space again.

By delaying ACKs you only slow down the clients performance, the time it
takes for the client to process a request. That might cause the client
to slow down sending data or not. That only works if the client has to
wait for the ACK before sending more requests.

Delaying ACKs for simplicity or safety sake makes absolut sense. But not
as an atempt to stop the client from sending more requests. Just stop
reading them if you have no buffer space to spare. Note that you need
the buffer space to READ them and up until the time you have commited
them to the storage backend. The ACK can be send at any time inbetween
those without altering the memory requirement one bit. It only matters
how many requests you read and process in parallel, not when the ACK
goes out. Just think of where you would place the malloc() and free()
calls for the buffered data. It won't be in the function sending the
ACK.

MfG
        Goswin




Reply to: