[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#584881: Lockups under heavy disk IO; md (RAID) resync/check implicated



Ben Hutchings writes ("Re: Bug#584881: Lockups under heavy disk IO; md (RAID) resync/check implicated"):
> Even if you can't get a process dump, you can get some useful
> information with:

Right, thanks.

> 'd' - show locks held
> 'l' - show backtrace for active CPUs
> 'w' - show uninterruptible tasks

I'll try these although I suspect thousands of uninterruptible tasks.

> > Search the web suggests that symptoms very similar to mine are not
> > uncommon, including instances without soft lockup messages, and none
> > of the other users seem to have a similar disk layout.
> > 
> > I can't easily test this theory but I think the unusual disk layout is
> > probably simply making a race easier to trigger.
> 
> Thinking of some kind of lock-dependency bug?  Blocking on a mutex for a
> long period should still trigger a soft-lockup message.  Since there are
> no messages from the kernel it's something of a mystery what's going on.

The RAID system (md driver) has a separate mechanism for blocking
writes, which it calls a "barrier".  I think it is quite capable of
indefinitely blocking all writes to a device without necessarily
triggering the soft lockup detector.

> > I'll see if I can borrow a spare R210 from Jump, in which case I may
> > be able to reproduce the problem in controlled conditions on my coffee
> > table at home (and with access to the VGA console).  Which kernel
> > should I test in that case ?
> 
> Please try 2.6.34 from experimental.

Will do.  I'll get back to you.

Thanks,
Ian.



Reply to: