[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Mirroring a failing HDD



On Fri, Nov 10, 2006 at 12:38:43PM +0000, Shri Shrikumar wrote:
> There is a server(sarge) that I maintain that used to be mirrored and
> all was well. However, the mirror was recently broken and when trying to
> rebuild, I run into an interesting problem.
> 
> The array rebuilds to about 80% and then restarts. dmesg has the following:
> 
> RAID1 conf printout:
> --- wd:1 rd:2
> disk 0, wo:1, o:1, dev:sda3
> disk 1, wo:0, o:1, dev:sdb3
> ata2: status=0x51 { DriveReady SeekComplete Error }
> ata2: error=0x40 { UncorrectableError }
> SCSI error : <1 0 0 0> return code = 0x8000002
> sdb: Current: sense key: Medium Error
>    Additional sense: Unrecovered read error - auto reallocate failed
> end_request: I/O error, dev sdb, sector 268475518
 
> //..... removed few more of these lines
> 
 
> Mirror information as follows:
> 
> $ sudo mdadm -D /dev/md1
> /dev/md1:
>        Version : 00.90.01
>  Creation Time : Tue Nov  8 16:54:09 2005
>     Raid Level : raid1
>     Array Size : 155276160 (148.08 GiB 159.00 GB)
>    Device Size : 155276160 (148.08 GiB 159.00 GB)
>   Raid Devices : 2
>  Total Devices : 2
> Preferred Minor : 1
>    Persistence : Superblock is persistent
> 
>          State : active, degraded, recovering
> Active Devices : 1
> Working Devices : 2
> Failed Devices : 0
>  Spare Devices : 1
> 
> Rebuild Status : 59% complete
> 
 
>    Number   Major   Minor   RaidDevice State
>       0       0        0        -      removed
>       1       8       19        1      active sync   /dev/sdb3
> 
>       2       8        3        0      spare rebuilding   /dev/sda3
> 
> of the 159GB, only 60gb is allocated(using LVM) and I am guessing that
> the failing part is within the unallocated section.
> 

Hopefully, someone who has had a raid disk failure and has watched a
recovery can offer some better advice.  However, my interpretation is
this:

You had a raid consisting of two drives, one of which is /dev/sdb3.  One
died, leaving sdb3.  However, during rebuild /dev/sdb is having read
errors.  

I see three possbile sources of read erroros on sdb3:
	
	The third partition (sdb3) is corrupted

	The drive is failing (sdb)

	The controller that the drive is on is failing.

Was the origional failed drive on the same controller?  Could that
drive failure also have killed the controller?  Could it have been a
controller failure and not a drive failure?  Do you have another
controller in the box to which you could connect the drive that is now
hdb?  Are there other drives on this controller that are working OK?

Can you read other partitions on hdb? (and therefore prove that both the
controller and drive are OK)

I hope you have good backups.  

How does the system work if you disconnect the new drive (sda) so that the
raid runs in degradded mode?  Do you still get read errors?  If you
don't have backups, this would be a good time to copy the data
preferably off this box via scp or something to another computer on your
network.

Doug.



Reply to: