[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Why didn't software RAID detect a faulty drive?



martin f krafft wrote:
> Answering all messages in this thread in one:
> 
> also sprach Seth Mattinen <sethm@rollernet.us> [2009.07.19.0206 +0200]:
>> [3948800.929508] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6
>> [3948800.949314] ata2.00: BMDMA stat 0x4
>> [3948800.960273] ata2.00: cmd ca/00:10:6f:02:22/00:00:00:00:00/e0 tag
>> 0 dma 8192 out
>> [3948800.960276]          res 51/84:0a:75:02:22/00:00:00:00:00/e0
>> Emask 0x10 (ATA bus error)
>> [3948801.007509] ata2.00: status: { DRDY ERR }
>> [3948801.020017] ata2.00: error: { ICRC ABRT }
>> [3948801.032537] ata2: soft resetting link
>> [3948801.212298] ata2.00: configured for UDMA/33
>> [3948801.225345] ata2: EH complete
> 
> I occasionally see those on my servers and have not yet been able to
> figure out what they mean. I think they are related to SMART
> self-tests initiated by smartd. Are you running any of those?

I have smartd running. The only uncommented setting is "DEVICESCAN -m
root -M exec /usr/share/smartmontools/smartd-runner" i.e. no changes.


>> It's running software raid, so why is it locking up? I managed to
>> log in as root and cat /proc/mdstat:
>>
>> Personalities : [raid1]
>> md0 : active raid1 sda1[0] sdb1[1]
>>       78148096 blocks [2/2] [UU]
>> unused devices: <none>
> 
> Yeah, the same happens here: the RAID does not degrade. This gives
> me moderate levels of confidence that the kernel messages relate to
> something that is not actually an error and does not relate to
> a read error, just a hiccough, which isn't a bad deal and everyone
> just moves on.
> 

The system was horribly unresponsive; I never did try adding the drive
back in because it was a live server and I didn't want to risk it. I
would have expected any RAID to fault an unresponsive drive even if it
was a quirk. I just replaced it.


> 
> also sprach Seth Mattinen <sethm@rollernet.us> [2009.07.19.0237 +0200]:
>> I have a 600k capture file off the serial console if you're
>> interested. Unfortunately it's a production system so I can't play
>> with it.
> 
> Sure. ftp://ftp.madduck.net/incoming
> 

I'll try to remember to get it uploaded the next time I'm at the office
as it's on my workstation there.

~Seth


Reply to: