[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#671776: [George Shuklin: Re: [wheezy] md/raid10 deadlock at 'Failing raid device']



Forwarding with permission.
--- Begin Message ---
Thank you very much for attention.

It actually pre-product hardware set and every change of raid requirer very long DRBD resync, I think I'll have no enough time to repeat sync. If next time I'll have some time window between hardware shipping/deploying, I'll repeat that test.

PS Main interesting problem I got was 'silent device disappearing'. I had remove every JBOD disk via adaptec utility, they disappear from /dev/, but md has thought everything is fine until I done some dd on it. At that moment two arrays fails fine, but third stuck with single drive.

Thank you again.


George Shuklin
Cloud computing lead
http://selectel.ru/


On 10.05.2012 03:27, Jonathan Nieder wrote:
(resending to a different address)
Hi George,

George Shuklin wrote:

Got new raid10 deadlock during laboratory tests.

Setup: three adaptec controllers with 24 (3x8) directly attached
SATA drives. Every 8 disks is joined as raid10, those 3 raid10 is
used to creates raid0. System resides on disks, attached directly to
motherboard SATA controller.

Disks removed one by one via adaptec utility until no disks are at
all. After that some IO created on raid0. Two of three raid10
failing normally, but one got stuck:
[...]
Operations on md100 or md103 is just stucking and return no error or
result. dmesg is filling with incredible speed with message

[4474.074462] md/raid10:md103: sdaa: Failing raid device

The speed is so high, so syslog can not keep after ring buffer and
futher log looking like this:

May 5 21:20:04 server kernel: [ 4507.578492] md/raid10:md103: sdaa: Faaid devi
[...]
The main problem is not total mess with log, but stale IO on raid
device, disallowing to detect error and switch note in cluster
environment.
Thanks for reporting.  If you can reproduce this with a 3.3.y kernel
from experimental, please do contact upstream at
linux-raid@vger.kernel.org, cc-ing Neil Brown<neilb@suse.de>  and
either me or this bug log so we can track it.

Hope that helps,
Jonathan


--- End Message ---

Reply to: