Bug#446323: mdadm: recovery in infinite loop
On Thu, Feb 5, 2009 at 11:22 PM, Lukasz Szybalski <szybalski@gmail.com> wrote:
> On Thu, Feb 5, 2009 at 12:45 PM, martin f krafft <madduck@debian.org> wrote:
>> also sprach Lukasz Szybalski <szybalski@gmail.com> [2009.02.05.0731 +0100]:
>>> Both drives are fairly new...Would the drive be defective since day1?
>>>
>>> Was the error log that I am getting now available in previous version
>>> of the kernel?
>>
>> Yes.
>>
>> Either the drives are faulty or the controller.
>
> This is a software raid? Which controller?
>
> Is there a way to run fschk in a way that moves the bad sectors into good once.
>
> I've also run the smartctrl on the hard drive that being added which
> gives errors and it shows "no errors occurred"
>
> aptitude install smartmontools
>
>
> "=== START OF READ SMART DATA SECTION ===
> SMART Error Log Version: 1
> No Errors Logged
> "
>
>
> And the drives that is active and I want to sync from:
> "=== START OF READ SMART DATA SECTION ===
> SMART Error Log Version: 1
> ATA Error Count: 1665 (device log contains only the most recent five errors)
> CR = Command Register [HEX]
> FR = Features Register [HEX]
> SC = Sector Count Register [HEX]
> SN = Sector Number Register [HEX]
> CL = Cylinder Low Register [HEX]
> CH = Cylinder High Register [HEX]
> DH = Device/Head Register [HEX]
> DC = Device Command Register [HEX]
> ER = Error register [HEX]
> ST = Status register [HEX]
> Powered_Up_Time is measured from power on, and printed as
> DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
> SS=sec, and sss=millisec. It "wraps" after 49.710 days.
>
> Error 1665 occurred at disk power-on lifetime: 11609 hours (483 days + 17 hours)
> When the command that caused the error occurred, the device was
> active or idle.
>
> After command completion occurred, registers were:
> ER ST SC SN CL CH DH
> -- -- -- -- -- -- --
> 40 51 07 5b 00 48 e0 Error: UNC 7 sectors at LBA = 0x0048005b = 4718683
>
> Commands leading to the command that caused the error were:
> CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
> -- -- -- -- -- -- -- -- ---------------- --------------------
> 25 00 08 5a 00 48 05 28 13:59:36.510 READ DMA EXT
> 25 00 08 52 00 48 05 28 13:59:36.510 READ DMA EXT
> 25 00 08 4a 00 48 05 28 13:59:36.510 READ DMA EXT
> 25 00 08 42 00 48 05 28 13:59:36.510 READ DMA EXT
> 25 00 08 3a 00 48 05 28 13:59:36.510 READ DMA EXT
>
> Error 1664 occurred at disk power-on lifetime: 11609 hours (483 days + 17 hours)
> When the command that caused the error occurred, the device was
> active or idle.
>
> After command completion occurred, registers were:
> ER ST SC SN CL CH DH
> -- -- -- -- -- -- --
> 40 51 37 5b 00 48 e0 Error: UNC 55 sectors at LBA = 0x0048005b = 4718683
>
> Commands leading to the command that caused the error were:
> CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
> -- -- -- -- -- -- -- -- ---------------- --------------------
> 25 00 38 5a 00 48 05 28 13:59:34.340 READ DMA EXT
> 25 00 40 52 00 48 05 28 13:59:32.250 READ DMA EXT
> 25 00 48 4a 00 48 05 28 13:59:30.185 READ DMA EXT
> 25 00 50 42 00 48 05 28 13:59:28.060 READ DMA EXT
> 25 00 58 3a 00 48 05 28 13:59:25.970 READ DMA EXT
>
> Error 1663 occurred at disk power-on lifetime: 11609 hours (483 days + 17 hours)
> When the command that caused the error occurred, the device was
> active or idle.
>
> After command completion occurred, registers were:
> ER ST SC SN CL CH DH
> -- -- -- -- -- -- --
> 40 51 37 5b 00 48 e0 Error: UNC 55 sectors at LBA = 0x0048005b = 4718683
>
> Commands leading to the command that caused the error were:
> CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
> -- -- -- -- -- -- -- -- ---------------- --------------------
> 25 00 40 52 00 48 05 28 13:59:32.250 READ DMA EXT
> 25 00 48 4a 00 48 05 28 13:59:30.185 READ DMA EXT
> 25 00 50 42 00 48 05 28 13:59:28.060 READ DMA EXT
> 25 00 58 3a 00 48 05 28 13:59:25.970 READ DMA EXT
> 25 00 60 32 00 48 05 28 13:59:23.735 READ DMA EXT
>
> Error 1662 occurred at disk power-on lifetime: 11609 hours (483 days + 17 hours)
> When the command that caused the error occurred, the device was
> active or idle.
>
> After command completion occurred, registers were:
> ER ST SC SN CL CH DH
> -- -- -- -- -- -- --
> 40 51 37 5b 00 48 e0 Error: UNC 55 sectors at LBA = 0x0048005b = 4718683
>
> Commands leading to the command that caused the error were:
> CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
> -- -- -- -- -- -- -- -- ---------------- --------------------
> 25 00 48 4a 00 48 05 28 13:59:30.185 READ DMA EXT
> 25 00 50 42 00 48 05 28 13:59:28.060 READ DMA EXT
> 25 00 58 3a 00 48 05 28 13:59:25.970 READ DMA EXT
> 25 00 60 32 00 48 05 28 13:59:23.735 READ DMA EXT
> 25 00 68 2a 00 48 05 28 13:59:21.665 READ DMA EXT
>
> Error 1661 occurred at disk power-on lifetime: 11609 hours (483 days + 17 hours)
> When the command that caused the error occurred, the device was
> active or idle.
>
> After command completion occurred, registers were:
> ER ST SC SN CL CH DH
> -- -- -- -- -- -- --
> 40 51 36 5c 00 48 e0 Error: UNC 54 sectors at LBA = 0x0048005c = 4718684
>
> Commands leading to the command that caused the error were:
> CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
> -- -- -- -- -- -- -- -- ---------------- --------------------
> 25 00 50 42 00 48 05 28 13:59:28.060 READ DMA EXT
> 25 00 58 3a 00 48 05 28 13:59:25.970 READ DMA EXT
> 25 00 60 32 00 48 05 28 13:59:23.735 READ DMA EXT
> 25 00 68 2a 00 48 05 28 13:59:21.665 READ DMA EXT
> 25 00 70 22 00 48 05 28 13:59:19.260 READ DMA EXT
>
> lucas@hplinux:~$ cat /proc/mdstat
> Personalities : [raid1] [raid6] [raid5] [raid4] [raid0]
> md4 : active raid1 hda5[0] hdc5[1]
> 1951744 blocks [2/2] [UU]
>
> md2 : active raid1 hdc2[1]
> 276438400 blocks [2/1] [_U]
>
> md0 : active raid1 hda1[0] hdc1[1]
> 34178176 blocks [2/2] [UU]
> "
>
Since it seems as hdc has some issues, I know I can copy stuff
manually from hdc2 to hda2.
cp -dpRx /files /mnt/hda2
How do I make the hdc2 be removed from md2 (md2 will be empty then)
then add hda2 as primary/first hard drive for md2 partition and then
sync hdc2 to it.?
At this point I think we can close this bug as it seems it is a
harddrive issue unless somebody has similar issue, or might know what
is causing it. I guess I should have checked the smartctrl when I
initially got the hard drive to see if there are any errors.
Thanks,
Lucas
Reply to: