Bug#671776: [wheezy] md/raid10 deadlock at 'Failing raid device'

To: George Shuklin <shuklin-selectel.ru@cvt-cm1.obl.selectel.org>
Cc: 671776@bugs.debian.org
Subject: Bug#671776: [wheezy] md/raid10 deadlock at 'Failing raid device'
From: Jonathan Nieder <jrnieder@gmail.com>
Date: Wed, 9 May 2012 18:25:04 -0500
Message-id: <[🔎] 20120509232504.GB7921@burratino>
Reply-to: Jonathan Nieder <jrnieder@gmail.com>, 671776@bugs.debian.org
In-reply-to: <[🔎] 20120506201913.17643.8536.reportbug@cvt-xs11>
References: <[🔎] 20120506201913.17643.8536.reportbug@cvt-xs11>

Hi George,

George Shuklin wrote:

> Got new raid10 deadlock during laboratory tests.
>
> Setup: three adaptec controllers with 24 (3x8) directly attached
> SATA drives. Every 8 disks is joined as raid10, those 3 raid10 is
> used to creates raid0. System resides on disks, attached directly to
> motherboard SATA controller.
>
> Disks removed one by one via adaptec utility until no disks are at
> all. After that some IO created on raid0. Two of three raid10
> failing normally, but one got stuck:
[...]
> Operations on md100 or md103 is just stucking and return no error or
> result. dmesg is filling with incredible speed with message
>
> [4474.074462] md/raid10:md103: sdaa: Failing raid device
>
> The speed is so high, so syslog can not keep after ring buffer and
> futher log looking like this:
>
> May 5 21:20:04 server kernel: [ 4507.578492] md/raid10:md103: sdaa: Faaid devi
[...]
> The main problem is not total mess with log, but stale IO on raid
> device, disallowing to detect error and switch note in cluster
> environment.

Thanks for reporting.  If you can reproduce this with a 3.3.y kernel
from experimental, please do contact upstream at
linux-raid@vger.kernel.org, cc-ing Neil Brown <neilb@suse.de> and
either me or this bug log so we can track it.

Hope that helps,
Jonathan

Reply to:

References:
- Bug#671776: linux-image-3.2.0-2-amd64: md/raid10 deadlock at 'Failing raid device'
  - From: George Shuklin <shuklin-selectel.ru@cvt-cm1.obl.selectel.org>

Prev by Date: Bug#671895: [sparc] Kernel NULL pointer dereference in sungem/gem_poll() (Re: updates)
Next by Date: Bug#671776: [wheezy] md/raid10 deadlock at 'Failing raid device'
Previous by thread: Bug#671776: linux-image-3.2.0-2-amd64: md/raid10 deadlock at 'Failing raid device'
Next by thread: Bug#671776: [wheezy] md/raid10 deadlock at 'Failing raid device'
Index(es):
- Date
- Thread