Bug#446323: mdadm: recovery in infinite loop
- To: Moritz Muehlenhoff <jmm@inutil.org>
- Cc: 446323@bugs.debian.org, jmm@debian.org, neilb@suse.de
- Subject: Bug#446323: mdadm: recovery in infinite loop
- From: Lukasz Szybalski <szybalski@gmail.com>
- Date: Tue, 3 Feb 2009 23:55:52 -0600
- Message-id: <[🔎] 804e5c70902032155g16052198nd368b2037c1af6e7@mail.gmail.com>
- Reply-to: Lukasz Szybalski <szybalski@gmail.com>, 446323@bugs.debian.org
- In-reply-to: <804e5c70901041926p5d23b97k9167df867cbc528e@mail.gmail.com>
- References: <20071012015700.4858.19288.reportbug@hplinux.lucasmanual.com> <804e5c70710120541r47dd74b6mf76933859304fbcf@mail.gmail.com> <18191.32511.583454.228279@notabene.brown> <804e5c70710151639s68e9b25bw68c806c43ab8708d@mail.gmail.com> <18195.64314.832488.656332@notabene.brown> <804e5c70710151648s46f1791ap57ce24fa7e930dcb@mail.gmail.com> <18196.4841.13486.975781@notabene.brown> <804e5c70710171416r58516ed1g7555e001976751b4@mail.gmail.com> <20090104145534.GA6839@galadriel.inutil.org> <804e5c70901041926p5d23b97k9167df867cbc528e@mail.gmail.com>
On Sun, Jan 4, 2009 at 9:26 PM, Lukasz Szybalski <szybalski@gmail.com> wrote:
> On Sun, Jan 4, 2009 at 8:55 AM, Moritz Muehlenhoff <jmm@inutil.org> wrote:
>> fOn Wed, Oct 17, 2007 at 04:16:39PM -0500, Lukasz Szybalski wrote:
>>> On 10/15/07, Neil Brown <neilb@suse.de> wrote:
>>> >
>>> > As you say, the devices are exactly the same size, thanks.
>>> >
>>> > On Monday October 15, szybalski@gmail.com wrote:
>>> > >
>>> > > how do I undo? mdadm /dev/md2 -f /dev/hda2
>>> > > So I could try the sync in init 1
>>> > > Lucas
>>> >
>>> > Well, you could:
>>> > mdadm /dev/md2 -f /dev/hda2
>>> > mdadm /dev/md2 -r /dev/hda2
>>> >
>>> > then when you are ready to try again
>>> >
>>> > mdadm /dev/md2 -a /dev/hda2
>>> >
>>> Ok,
>>> So I went into
>>> init 1
>>> to get rid of any program that might want to access the hda2.
>>> I unmounted my '/files' which mounts my hda2 in '/' folder.
>>>
>>> I was watching it as it went for 30 min to sync the drives and suddenly I got
>>> unrecognizable error 5 i believe it was. Unable to read sector lba
>>> 88604764 on hdb1.
>>>
>>> I guess that showed up in stderr or something because I couldn't find
>>> reference to it anywhere other then a terminal screen.
>>>
>>> What would be the proper command for testing every single block on
>>> that hardrive using e2fsck
>>>
>>> I used e2fsck -acf /dev/hdb2 but that took all night and it still
>>> wasn't finished. I've canceled it. What would be the proper options
>>> for this command that would get me to clean this drive.
>>>
>>> The weird part is when I checked the drives last Thursday using
>>> knoppix nothing has shown any problems or bad sectors.
>>>
>>>
>>>
>>> > I think there must be something odd happening with the drive or
>>> > controller. I notice that the two devices are on the same IDE
>>> > channel, which is sometimes a source of problems, though it should
>>> > behave like this.
>>> >
>>> > If you feel up to patching the kernel, recompiling, and experimenting,
>>> > I can send you a patch which should provide more detailed information
>>> > on what is happening. Let me know what kernel version you will be
>>> > working with.
>>> Never done recompiling of a kernel before, but I guess if everything
>>> fails then we can try it. For now let me clean this drive and we go
>>> from there.
>>
>> What's the status of this bug?
>> Does this error still occur with more recent kernel versions?
>>
>> If you're running Etch, you could try to reproduce this bug
>> with the 2.6.24 based kernel added in 4.0r4:
>> http://packages.qa.debian.org/l/linux-2.6.24.html
>>
>
>
> It still exists. I will upgrade this week and I will let you know.
Hello,
1. I've upgraded.
aptitude update
aptitude upgrade
then
aptitude install linux-image-2.6.24-etchnhalf.1-484
2. for some reason my grub menu has changed so I had to update the drives.
vi /boot/grub/menu.lst
and changed the root=/dev/md0
title Debian GNU/Linux, kernel 2.6.24-etchnhalf.1-486
root (hd0,0)
kernel /boot/vmlinuz-2.6.24-etchnhalf.1-486 root=/dev/md0 ro
initrd /boot/initrd.img-2.6.24-etchnhalf.1-486
savedefault
3. I then readded my /dev/hda2 to /dev/md2 ....and a lot of errors
started to appear in syslog...
4. I run e2fsck and I forced it to check and automatically fix things
but I still get these errors....
Any ideas what can I do now? I also found another user who contacted
me with the same issue.
Below are the errors.
Thanks,
Lucas
Feb 3 23:49:09 hplinux kernel: hdc: dma_intr: status=0x51 {
DriveReady SeekComplete Error }
Feb 3 23:49:09 hplinux kernel: hdc: dma_intr: error=0x40 {
UncorrectableError }, LBAsect=88604764, high=5, low=4718684,
sector=88604458
Feb 3 23:49:09 hplinux kernel: ide: failed opcode was: unknown
Feb 3 23:49:09 hplinux kernel: end_request: I/O error, dev hdc, sector 88604458
Feb 3 23:49:11 hplinux kernel: hdc: dma_intr: status=0x51 {
DriveReady SeekComplete Error }
Feb 3 23:49:11 hplinux kernel: hdc: dma_intr: error=0x40 {
UncorrectableError }, LBAsect=88604763, high=5, low=4718683,
sector=88604466
Feb 3 23:49:11 hplinux kernel: ide: failed opcode was: unknown
Feb 3 23:49:11 hplinux kernel: end_request: I/O error, dev hdc, sector 88604466
Feb 3 23:49:13 hplinux kernel: hdc: dma_intr: status=0x51 {
DriveReady SeekComplete Error }
Feb 3 23:49:13 hplinux kernel: hdc: dma_intr: error=0x01 {
AddrMarkNotFound }, LBAsect=88604764, high=5, low=4718684,
sector=88604474
Feb 3 23:49:13 hplinux kernel: ide: failed opcode was: unknown
Feb 3 23:49:15 hplinux kernel: hdc: dma_intr: status=0x51 {
DriveReady SeekComplete Error }
Feb 3 23:49:15 hplinux kernel: hdc: dma_intr: error=0x01 {
AddrMarkNotFound }, LBAsect=88604764, high=5, low=4718684,
sector=88604474
Feb 3 23:49:15 hplinux kernel: ide: failed opcode was: unknown
Reply to: