Bug#584881: Lockups under heavy disk IO; md (RAID) resync/check implicated

To: Ben Hutchings <ben@decadent.org.uk>
Cc: 584881@bugs.debian.org
Subject: Bug#584881: Lockups under heavy disk IO; md (RAID) resync/check implicated
From: Ian Jackson <ijackson@chiark.greenend.org.uk>
Date: Fri, 25 Jun 2010 12:27:11 +0100
Message-id: <19492.37519.253066.945143@chiark.greenend.org.uk>
Reply-to: Ian Jackson <ijackson@chiark.greenend.org.uk>, 584881@bugs.debian.org
In-reply-to: <1277464428.26161.224.camel@localhost>
References: <19468.49549.475813.179092@chiark.greenend.org.uk> <1277075288.14011.1019.camel@localhost> <19487.15030.702626.287407@chiark.greenend.org.uk> <1277345735.26161.142.camel@localhost> <19491.12472.880707.704477@chiark.greenend.org.uk> <1277427210.26161.187.camel@localhost> <19492.35328.840651.181744@chiark.greenend.org.uk> <1277464428.26161.224.camel@localhost>

Ben Hutchings writes ("Re: Bug#584881: Lockups under heavy disk IO; md (RAID) resync/check implicated"):
> On Fri, 2010-06-25 at 11:50 +0100, Ian Jackson wrote:
> > No, I think there are two meanings of the word "barrier".  AFAICT md
> > has its own thing which it confusingly calls a "barrier"; it can be
> > "raised" and "lowered".
> 
> Oh, great!  I wondered whether this was the case but I could only find
> discussion of md vs I/O barriers.  Do you have any reference for
> documentation of md barriers?

No.  I just stumbled across them in the source.  Particularly this in
drivers/md/raid1.c:

/* Barriers....
 * Sometimes we need to suspend IO while we do something else,
 * either some resync/recovery, or reconfigure the array.
 * To do this we raise a 'barrier'.
 * The 'barrier' is a counter that can be raised multiple times
 * to count how many activities are happening which preclude
 * normal IO.
 * We can only raise the barrier if there is no pending IO.
 * i.e. if nr_pending == 0.
 * We choose only to raise the barrier if no-one is waiting for the
 * barrier to go down.  This means that as soon as an IO request
 * is ready, no other operations which require a barrier will start
 * until the IO request has had a chance.
 *
 * So: regular IO calls 'wait_barrier'.  When that returns there
 *    is no backgroup IO happening,  It must arrange to call
 *    allow_barrier when it has finished its IO.
 * backgroup IO calls must call raise_barrier.  Once that returns
 *    there is no normal IO happeing.  It must arrange to call
 *    lower_barrier when the particular background IO completes.
 */

Ian.

Reply to:

References:
- Bug#584881: Lockups under heavy disk IO; md (RAID) resync/check implicated
  - From: Ian Jackson <ijackson@chiark.greenend.org.uk>
- Bug#584881: Lockups under heavy disk IO; md (RAID) resync/check implicated
  - From: Ben Hutchings <ben@decadent.org.uk>
- Bug#584881: Lockups under heavy disk IO; md (RAID) resync/check implicated
  - From: Ian Jackson <ijackson@chiark.greenend.org.uk>
- Bug#584881: Lockups under heavy disk IO; md (RAID) resync/check implicated
  - From: Ben Hutchings <ben@decadent.org.uk>
- Bug#584881: Lockups under heavy disk IO; md (RAID) resync/check implicated
  - From: Ian Jackson <ijackson@chiark.greenend.org.uk>
- Bug#584881: Lockups under heavy disk IO; md (RAID) resync/check implicated
  - From: Ben Hutchings <ben@decadent.org.uk>
- Bug#584881: Lockups under heavy disk IO; md (RAID) resync/check implicated
  - From: Ian Jackson <ijackson@chiark.greenend.org.uk>
- Bug#584881: Lockups under heavy disk IO; md (RAID) resync/check implicated
  - From: Ben Hutchings <ben@decadent.org.uk>

Prev by Date: Processed: severity of 548434 is important
Next by Date: Bug#587032: linux-source-2.6.34: make menuconfig segfaults
Previous by thread: Bug#584881: Lockups under heavy disk IO; md (RAID) resync/check implicated
Next by thread: Processed: reassign 584881 to linux-2.6
Index(es):
- Date
- Thread