Bug#446323: mdadm: recovery in infinite loop

To: martin f krafft <madduck@debian.org>
Cc: 446323@bugs.debian.org, Moritz Muehlenhoff <jmm@inutil.org>, neilb@suse.de
Subject: Bug#446323: mdadm: recovery in infinite loop
From: Lukasz Szybalski <szybalski@gmail.com>
Date: Sun, 8 Feb 2009 19:13:49 -0600
Message-id: <[🔎] 804e5c70902081713n587b5203o69e0aa3a229c15eb@mail.gmail.com>
Reply-to: Lukasz Szybalski <szybalski@gmail.com>, 446323@bugs.debian.org
In-reply-to: <[🔎] 804e5c70902052122k213b8a6bkbf7a46bd502ba9@mail.gmail.com>
References: <804e5c70710151639s68e9b25bw68c806c43ab8708d@mail.gmail.com> <18196.4841.13486.975781@notabene.brown> <804e5c70710171416r58516ed1g7555e001976751b4@mail.gmail.com> <20090104145534.GA6839@galadriel.inutil.org> <804e5c70901041926p5d23b97k9167df867cbc528e@mail.gmail.com> <[🔎] 804e5c70902032155g16052198nd368b2037c1af6e7@mail.gmail.com> <[🔎] 20090204091248.GA12773@piper.oerlikon.madduck.net> <[🔎] 804e5c70902042231q46b97b96m6ed160ce0546ea86@mail.gmail.com> <[🔎] 20090205184554.GB11487@piper.oerlikon.madduck.net> <[🔎] 804e5c70902052122k213b8a6bkbf7a46bd502ba9@mail.gmail.com>

On Thu, Feb 5, 2009 at 11:22 PM, Lukasz Szybalski <szybalski@gmail.com> wrote:
> On Thu, Feb 5, 2009 at 12:45 PM, martin f krafft <madduck@debian.org> wrote:
>> also sprach Lukasz Szybalski <szybalski@gmail.com> [2009.02.05.0731 +0100]:
>>> Both drives are fairly new...Would the drive be defective since day1?
>>>
>>> Was the error log that I am getting now available in previous version
>>> of the kernel?
>>
>> Yes.
>>
>> Either the drives are faulty or the controller.
>
> This is a software raid? Which controller?
>
> Is there a way to run fschk in a way that moves the bad sectors into good once.
>
> I've also run the smartctrl on the hard drive that being added which
> gives errors and it shows "no errors occurred"
>
> aptitude install smartmontools
>
>
> "=== START OF READ SMART DATA SECTION ===
> SMART Error Log Version: 1
> No Errors Logged
> "
>
>
> And the drives that is active and I want to sync from:
> "=== START OF READ SMART DATA SECTION ===
> SMART Error Log Version: 1
> ATA Error Count: 1665 (device log contains only the most recent five errors)
>        CR = Command Register [HEX]
>        FR = Features Register [HEX]
>        SC = Sector Count Register [HEX]
>        SN = Sector Number Register [HEX]
>        CL = Cylinder Low Register [HEX]
>        CH = Cylinder High Register [HEX]
>        DH = Device/Head Register [HEX]
>        DC = Device Command Register [HEX]
>        ER = Error register [HEX]
>        ST = Status register [HEX]
> Powered_Up_Time is measured from power on, and printed as
> DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
> SS=sec, and sss=millisec. It "wraps" after 49.710 days.
>
> Error 1665 occurred at disk power-on lifetime: 11609 hours (483 days + 17 hours)
>  When the command that caused the error occurred, the device was
> active or idle.
>
>  After command completion occurred, registers were:
>  ER ST SC SN CL CH DH
>  -- -- -- -- -- -- --
>  40 51 07 5b 00 48 e0  Error: UNC 7 sectors at LBA = 0x0048005b = 4718683
>
>  Commands leading to the command that caused the error were:
>  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
>  -- -- -- -- -- -- -- --  ----------------  --------------------
>  25 00 08 5a 00 48 05 28      13:59:36.510  READ DMA EXT
>  25 00 08 52 00 48 05 28      13:59:36.510  READ DMA EXT
>  25 00 08 4a 00 48 05 28      13:59:36.510  READ DMA EXT
>  25 00 08 42 00 48 05 28      13:59:36.510  READ DMA EXT
>  25 00 08 3a 00 48 05 28      13:59:36.510  READ DMA EXT
>
> Error 1664 occurred at disk power-on lifetime: 11609 hours (483 days + 17 hours)
>  When the command that caused the error occurred, the device was
> active or idle.
>
>  After command completion occurred, registers were:
>  ER ST SC SN CL CH DH
>  -- -- -- -- -- -- --
>  40 51 37 5b 00 48 e0  Error: UNC 55 sectors at LBA = 0x0048005b = 4718683
>
>  Commands leading to the command that caused the error were:
>  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
>  -- -- -- -- -- -- -- --  ----------------  --------------------
>  25 00 38 5a 00 48 05 28      13:59:34.340  READ DMA EXT
>  25 00 40 52 00 48 05 28      13:59:32.250  READ DMA EXT
>  25 00 48 4a 00 48 05 28      13:59:30.185  READ DMA EXT
>  25 00 50 42 00 48 05 28      13:59:28.060  READ DMA EXT
>  25 00 58 3a 00 48 05 28      13:59:25.970  READ DMA EXT
>
> Error 1663 occurred at disk power-on lifetime: 11609 hours (483 days + 17 hours)
>  When the command that caused the error occurred, the device was
> active or idle.
>
>  After command completion occurred, registers were:
>  ER ST SC SN CL CH DH
>  -- -- -- -- -- -- --
>  40 51 37 5b 00 48 e0  Error: UNC 55 sectors at LBA = 0x0048005b = 4718683
>
>  Commands leading to the command that caused the error were:
>  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
>  -- -- -- -- -- -- -- --  ----------------  --------------------
>  25 00 40 52 00 48 05 28      13:59:32.250  READ DMA EXT
>  25 00 48 4a 00 48 05 28      13:59:30.185  READ DMA EXT
>  25 00 50 42 00 48 05 28      13:59:28.060  READ DMA EXT
>  25 00 58 3a 00 48 05 28      13:59:25.970  READ DMA EXT
>  25 00 60 32 00 48 05 28      13:59:23.735  READ DMA EXT
>
> Error 1662 occurred at disk power-on lifetime: 11609 hours (483 days + 17 hours)
>  When the command that caused the error occurred, the device was
> active or idle.
>
>  After command completion occurred, registers were:
>  ER ST SC SN CL CH DH
>  -- -- -- -- -- -- --
>  40 51 37 5b 00 48 e0  Error: UNC 55 sectors at LBA = 0x0048005b = 4718683
>
>  Commands leading to the command that caused the error were:
>  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
>  -- -- -- -- -- -- -- --  ----------------  --------------------
>  25 00 48 4a 00 48 05 28      13:59:30.185  READ DMA EXT
>  25 00 50 42 00 48 05 28      13:59:28.060  READ DMA EXT
>  25 00 58 3a 00 48 05 28      13:59:25.970  READ DMA EXT
>  25 00 60 32 00 48 05 28      13:59:23.735  READ DMA EXT
>  25 00 68 2a 00 48 05 28      13:59:21.665  READ DMA EXT
>
> Error 1661 occurred at disk power-on lifetime: 11609 hours (483 days + 17 hours)
>  When the command that caused the error occurred, the device was
> active or idle.
>
>  After command completion occurred, registers were:
>  ER ST SC SN CL CH DH
>  -- -- -- -- -- -- --
>  40 51 36 5c 00 48 e0  Error: UNC 54 sectors at LBA = 0x0048005c = 4718684
>
>  Commands leading to the command that caused the error were:
>  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
>  -- -- -- -- -- -- -- --  ----------------  --------------------
>  25 00 50 42 00 48 05 28      13:59:28.060  READ DMA EXT
>  25 00 58 3a 00 48 05 28      13:59:25.970  READ DMA EXT
>  25 00 60 32 00 48 05 28      13:59:23.735  READ DMA EXT
>  25 00 68 2a 00 48 05 28      13:59:21.665  READ DMA EXT
>  25 00 70 22 00 48 05 28      13:59:19.260  READ DMA EXT
>
> lucas@hplinux:~$ cat /proc/mdstat
> Personalities : [raid1] [raid6] [raid5] [raid4] [raid0]
> md4 : active raid1 hda5[0] hdc5[1]
>      1951744 blocks [2/2] [UU]
>
> md2 : active raid1 hdc2[1]
>      276438400 blocks [2/1] [_U]
>
> md0 : active raid1 hda1[0] hdc1[1]
>      34178176 blocks [2/2] [UU]
> "
>



Since it seems as hdc has some issues, I know I can copy stuff
manually from hdc2 to hda2.

cp -dpRx /files /mnt/hda2

How do I make the hdc2 be removed from md2 (md2 will be empty then)
then add hda2 as primary/first hard drive for md2 partition and then
sync hdc2 to it.?

At this point I think we can close this bug as it seems it is a
harddrive issue unless somebody has similar issue, or might know what
is causing it. I guess I should have checked the smartctrl when I
initially got the hard drive to see if there are any errors.

Thanks,
Lucas

Reply to:

References:
- Bug#446323: mdadm: recovery in infinite loop
  - From: Lukasz Szybalski <szybalski@gmail.com>
- Bug#446323: mdadm: recovery in infinite loop
  - From: martin f krafft <madduck@debian.org>
- Bug#446323: mdadm: recovery in infinite loop
  - From: Lukasz Szybalski <szybalski@gmail.com>
- Bug#446323: mdadm: recovery in infinite loop
  - From: martin f krafft <madduck@debian.org>
- Bug#446323: mdadm: recovery in infinite loop
  - From: Lukasz Szybalski <szybalski@gmail.com>

Prev by Date: Bug#514288: stock debian kernels map heap, data, and other sections as rwx
Next by Date: Bug#514567: alsa hda - backport hardware support changes
Previous by thread: Bug#446323: mdadm: recovery in infinite loop
Next by thread: Bug#500265: focus this bug
Index(es):
- Date
- Thread