Bug#584881: Lockups under heavy disk IO; md (RAID) resync/check implicated
Ben Hutchings writes ("Re: Bug#584881: Lockups under heavy disk IO; md (RAID) resync/check implicated"):
> Even if you can't get a process dump, you can get some useful
> information with:
> 'd' - show locks held
> 'l' - show backtrace for active CPUs
> 'w' - show uninterruptible tasks
I'll try these although I suspect thousands of uninterruptible tasks.
> > Search the web suggests that symptoms very similar to mine are not
> > uncommon, including instances without soft lockup messages, and none
> > of the other users seem to have a similar disk layout.
> > I can't easily test this theory but I think the unusual disk layout is
> > probably simply making a race easier to trigger.
> Thinking of some kind of lock-dependency bug? Blocking on a mutex for a
> long period should still trigger a soft-lockup message. Since there are
> no messages from the kernel it's something of a mystery what's going on.
The RAID system (md driver) has a separate mechanism for blocking
writes, which it calls a "barrier". I think it is quite capable of
indefinitely blocking all writes to a device without necessarily
triggering the soft lockup detector.
> > I'll see if I can borrow a spare R210 from Jump, in which case I may
> > be able to reproduce the problem in controlled conditions on my coffee
> > table at home (and with access to the VGA console). Which kernel
> > should I test in that case ?
> Please try 2.6.34 from experimental.
Will do. I'll get back to you.