Bug#584881: Lockups under heavy disk IO; md (RAID) resync/check implicated
Ben Hutchings writes ("Re: Bug#584881: Lockups under heavy disk IO; md (RAID) resync/check implicated"):
> On Fri, 2010-06-25 at 11:50 +0100, Ian Jackson wrote:
> > No, I think there are two meanings of the word "barrier". AFAICT md
> > has its own thing which it confusingly calls a "barrier"; it can be
> > "raised" and "lowered".
>
> Oh, great! I wondered whether this was the case but I could only find
> discussion of md vs I/O barriers. Do you have any reference for
> documentation of md barriers?
No. I just stumbled across them in the source. Particularly this in
drivers/md/raid1.c:
/* Barriers....
* Sometimes we need to suspend IO while we do something else,
* either some resync/recovery, or reconfigure the array.
* To do this we raise a 'barrier'.
* The 'barrier' is a counter that can be raised multiple times
* to count how many activities are happening which preclude
* normal IO.
* We can only raise the barrier if there is no pending IO.
* i.e. if nr_pending == 0.
* We choose only to raise the barrier if no-one is waiting for the
* barrier to go down. This means that as soon as an IO request
* is ready, no other operations which require a barrier will start
* until the IO request has had a chance.
*
* So: regular IO calls 'wait_barrier'. When that returns there
* is no backgroup IO happening, It must arrange to call
* allow_barrier when it has finished its IO.
* backgroup IO calls must call raise_barrier. Once that returns
* there is no normal IO happeing. It must arrange to call
* lower_barrier when the particular background IO completes.
*/
Ian.
Reply to: