Bug#584881: Lockups under heavy disk IO; md (RAID) resync/check implicated

To: Ian Jackson <ijackson@chiark.greenend.org.uk>
Cc: 584881@bugs.debian.org
Subject: Bug#584881: Lockups under heavy disk IO; md (RAID) resync/check implicated
From: Ben Hutchings <ben@decadent.org.uk>
Date: Fri, 25 Jun 2010 01:53:30 +0100
Message-id: <1277427210.26161.187.camel@localhost>
Reply-to: Ben Hutchings <ben@decadent.org.uk>, 584881@bugs.debian.org
In-reply-to: <19491.12472.880707.704477@chiark.greenend.org.uk>
References: <19468.49549.475813.179092@chiark.greenend.org.uk> <1277075288.14011.1019.camel@localhost> <19487.15030.702626.287407@chiark.greenend.org.uk> <1277345735.26161.142.camel@localhost> <19491.12472.880707.704477@chiark.greenend.org.uk>

On Thu, 2010-06-24 at 11:17 +0100, Ian Jackson wrote:
> Ben Hutchings writes ("Re: Bug#584881: Lockups under heavy disk IO; md (RAID) resync/check implicated"):
[...]
> > > Search the web suggests that symptoms very similar to mine are not
> > > uncommon, including instances without soft lockup messages, and none
> > > of the other users seem to have a similar disk layout.
> > > 
> > > I can't easily test this theory but I think the unusual disk layout is
> > > probably simply making a race easier to trigger.
> > 
> > Thinking of some kind of lock-dependency bug?  Blocking on a mutex for a
> > long period should still trigger a soft-lockup message.  Since there are
> > no messages from the kernel it's something of a mystery what's going on.
> 
> The RAID system (md driver) has a separate mechanism for blocking
> writes, which it calls a "barrier".  I think it is quite capable of
> indefinitely blocking all writes to a device without necessarily
> triggering the soft lockup detector.
[...]

I/O barriers are block I/O operations (not specific to md) that inhibit
reordering of read and write operations.  They certainly should not be
blocking operations.  Also, device-mapper did not support barriers until
after 2.6.26 so md will not be using them in the configuration you are
using.

Ben.

-- 
Ben Hutchings
Once a job is fouled up, anything done to improve it makes it worse.

Attachment: signature.asc
Description: This is a digitally signed message part

Reply to:

Follow-Ups:
- Bug#584881: Lockups under heavy disk IO; md (RAID) resync/check implicated
  - From: Ian Jackson <ijackson@chiark.greenend.org.uk>

References:
- Bug#584881: Lockups under heavy disk IO; md (RAID) resync/check implicated
  - From: Ian Jackson <ijackson@chiark.greenend.org.uk>
- Bug#584881: Lockups under heavy disk IO; md (RAID) resync/check implicated
  - From: Ben Hutchings <ben@decadent.org.uk>
- Bug#584881: Lockups under heavy disk IO; md (RAID) resync/check implicated
  - From: Ian Jackson <ijackson@chiark.greenend.org.uk>
- Bug#584881: Lockups under heavy disk IO; md (RAID) resync/check implicated
  - From: Ben Hutchings <ben@decadent.org.uk>
- Bug#584881: Lockups under heavy disk IO; md (RAID) resync/check implicated
  - From: Ian Jackson <ijackson@chiark.greenend.org.uk>

Prev by Date: Bug#587014: screen brightness can't be modified on Panasonic S9
Next by Date: Processed: found 548434 in 2.6.30-1, fixed 548434 in 2.6.32-8
Previous by thread: Bug#584881: Lockups under heavy disk IO; md (RAID) resync/check implicated
Next by thread: Bug#584881: Lockups under heavy disk IO; md (RAID) resync/check implicated
Index(es):
- Date
- Thread