[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#671776: [wheezy] md/raid10 deadlock at 'Failing raid device'



(resending to a different address)
Hi George,

George Shuklin wrote:

> Got new raid10 deadlock during laboratory tests.
>
> Setup: three adaptec controllers with 24 (3x8) directly attached
> SATA drives. Every 8 disks is joined as raid10, those 3 raid10 is
> used to creates raid0. System resides on disks, attached directly to
> motherboard SATA controller.
>
> Disks removed one by one via adaptec utility until no disks are at
> all. After that some IO created on raid0. Two of three raid10
> failing normally, but one got stuck:
[...]
> Operations on md100 or md103 is just stucking and return no error or
> result. dmesg is filling with incredible speed with message
>
> [4474.074462] md/raid10:md103: sdaa: Failing raid device
>
> The speed is so high, so syslog can not keep after ring buffer and
> futher log looking like this:
>
> May 5 21:20:04 server kernel: [ 4507.578492] md/raid10:md103: sdaa: Faaid devi
[...]
> The main problem is not total mess with log, but stale IO on raid
> device, disallowing to detect error and switch note in cluster
> environment.

Thanks for reporting.  If you can reproduce this with a 3.3.y kernel
from experimental, please do contact upstream at
linux-raid@vger.kernel.org, cc-ing Neil Brown <neilb@suse.de> and
either me or this bug log so we can track it.

Hope that helps,
Jonathan



Reply to: