[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

problem with write intent bitmap in raid1, using debian on nslu2



Hi,

The subject line is a bit complicated, but this is the problem I have.
I suppose it is a bug in the md driver (related to the write intent
bitmap).

This is my configuration: I have two 500 GB disks attached to my
nslu2. On both disks I have a root, a swap and a home partition. All
these partitions have been configured in raid1 with mdadm after a
regular installation on the first disk. This worked perfectly well.
The problems began I after tried to add a write intent bitmap to the
superblock of all my raid drives on a running debian nslu2:

mdadm /dev/mdX -Gb internal

Now I realize that I did this on a mounted raid1-root partition. Is
this a problem? Anyway, not much longer the system became
unresponsive. I have tried to reboot, which didn't work. In order to
find out in more detail what was going on, I booted with the stable
etch installer and went into console mode. I also attached the serial
port to be able to trace the kernel messages. At the console, I first
installed mdadm:

/root # wget http://ftp.nl.debian.org/debian/pool/main/m/mdadm/mdadm-udeb_2.5.6-9_arm.udeb
mdadm-udeb_2.5.6-9_a 100% |*****************************| 75988       00:00 ETA
/root # udpkg -i mdadm-udeb_2.5.6-9_arm.udeb
(Reading database...)
(Updating database...)

Now I made a second login via ssh and try to assemble the raid1 root partition:

~ # mdadm --assemble --auto=yes /dev/md1 /dev/sda1 /dev/sdb1
mdadm: /dev/md1 has been started with 2 drives.

and then I tried to mount this partition:

~ # mkdir /mnt/target
~ # modprobe ext3
~ # mount /dev/md1 /mnt/target/

At this point the system hangs. At the serial console, I got these
error messages:

Unable to handle kernel NULL pointer dereference at virtual address 00000000
pgd = c0004000
[00000000] *pgd=00000000
Internal error: Oops: 817 [#1]
Modules linked in: raid1 md_mod vfat fat sd_mod usb_storage scsi_mod
evdev ixp4xx_mac ixp4xx_qmgr ixp4xx_beeper ixp4xx_npe firmware_class
ehci_hcd ohci_hcd e
CPU: 0
PC is at __bug+0x44/0x58
LR is at 0x1
pc : [<c00238a0>]    lr : [<00000001>]    Not tainted
sp : c0f85eb0  ip : 60000093  fp : c0f85ebc
r10: c128c200  r9 : 00000000  r8 : c0f85f6c
r7 : ffffffff  r6 : c136e120  r5 : c1aec5e0  r4 : 00000000
r3 : 00000000  r2 : 00000000  r1 : 0000274f  r0 : 00000001
Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  Segment kernel
Control: 397F  Table: 01A0C000  DAC: 00000017
Process md2_resync (pid: 7170, stack limit = 0xc0f84250)
Stack: (0xc0f85eb0 to 0xc0f86000)
5ea0:                                     c0f85f04 c0f85ec0 bf0afc20 c0023868
5ec0: 0003eb80 00000000 00000000 00000002 00000000 00001000 00000000 00000000
5ee0: c128c200 00000000 00000000 00000000 0003eb80 00000000 c0f85f98 c0f85f08
5f00: bf09ef24 bf0af86c 00000000 00000000 00000000 00000000 00000000 00000000
5f20: 00000000 00000000 00000000 00000000 0000a944 0000a944 0000a944 0000a944
5f40: 0000a944 0000a944 0000a944 0000a944 0000a944 0000a944 c0f85f80 c0f85f68
5f60: c0042de0 c002b634 c0f85f84 00000000 c0f69ce0 c0f69ce0 7fffffff c0f84000
5f80: 00000000 00000000 00000000 c0f85fcc c0f85f9c bf09f694 bf09eb54 c1346510
5fa0: c0f84000 c0f69ce0 c0f84000 c0f5de30 c0f69ce0 c0f84000 c0f5de30 bf09f588
5fc0: c0f85ff4 c0f85fd0 c0049890 bf09f594 ffffffff ffffffff 00000000 00000000
5fe0: 00000000 00000000 00000000 c0f85ff8 c0037fd8 c00497b4 00000000 00000000
Backtrace:
[<c002385c>] (__bug+0x0/0x58) from [<bf0afc20>]
(sync_request+0x3c0/0x5c0 [raid1])
[<bf0af860>] (sync_request+0x0/0x5c0 [raid1]) from [<bf09ef24>]
(md_do_sync+0x3dc/0x80c [md_mod])
[<bf09eb48>] (md_do_sync+0x0/0x80c [md_mod]) from [<bf09f694>]
(md_thread+0x10c/0x128 [md_mod])
[<bf09f588>] (md_thread+0x0/0x128 [md_mod]) from [<c0049890>]
(kthread+0xe8/0x128)
r7 = BF09F588  r6 = C0F5DE30  r5 = C0F84000  r4 = C0F69CE0
[<c00497a8>] (kthread+0x0/0x128) from [<c0037fd8>] (do_exit+0x0/0x840)
r7 = 00000000  r6 = 00000000  r5 = 00000000  r4 = 00000000
Code: eb004925 e59f0014 eb004923 e3a03000 (e5833000)
<6>md: md1 stopped.
md: bind<sdb1>
md: bind<sda1>
md: md1: raid array is not clean -- starting background reconstruction
raid1: raid set md1 active with 2 out of 2 mirrors
md1: bitmap initialized from disk: read 14/14 pages, set 0 bits, status: 0
created bitmap (209 pages) for device md1
md: delaying resync of md1 until md2 has finished resync (they share
one or more physical units)
kernel BUG at drivers/md/bitmap.c:1166!
Unable to handle kernel NULL pointer dereference at virtual address 00000000
pgd = c0004000
[00000000] *pgd=00000000
Internal error: Oops: 817 [#2]
Modules linked in: ext3 jbd mbcache raid1 md_mod vfat fat sd_mod
usb_storage scsi_mod evdev ixp4xx_mac ixp4xx_qmgr ixp4xx_beeper
ixp4xx_npe firmware_class ee
CPU: 0
PC is at __bug+0x44/0x58
LR is at 0x1
pc : [<c00238a0>]    lr : [<00000001>]    Not tainted
sp : c1395c5c  ip : 60000093  fp : c1395c68
r10: 00000002  r9 : c0a6f3e0  r8 : 00000000
r7 : 00000008  r6 : c029a3c0  r5 : c09f0000  r4 : 00000000
r3 : 00000000  r2 : 00000000  r1 : 0000327f  r0 : 00000001
Flags: nZCv  IRQs off  FIQs on  Mode SVC_32  Segment kernel
Control: 397F  Table: 00A48000  DAC: 00000017
Process pdflush (pid: 61, stack limit = 0xc1394250)
Stack: (0xc1395c5c to 0xc1396000)
5c40:                                                                c1395c90
5c60: c1395c6c bf0a2fa4 c0023868 00000000 c0a27aa8 00000000 c0a27aa0 c0a6f7a0
5c80: c0a6f420 c1395cd4 c1395c94 bf0af6f8 bf0a2e0c c136e4e0 c12ae400 00000002
5ca0: c1ccb2c0 00000000 00000000 00000000 c0a6f7a0 c0f630ec 00000008 00000008
5cc0: 00001000 c0a6f7a0 c1395d18 c1395cd8 c00f0488 bf0af184 00000000 00000000
5ce0: 00000000 c1395d00 c1395cf4 c005d1a4 c0075250 c1395d38 c0eb2694 c0a6f7a0
5d00: 00000001 00000000 c0f620e4 c1395d6c c1395d1c c00f29bc c00f030c c0a6f7a0
5d20: c02f6ce0 00000010 c0f620e4 c1395d5c c1395d3c c007e834 c005d274 c0eb2694
5d40: c029a080 00000001 c0eb2694 c0a6f7a0 00000001 00000000 c0f620e4 c1395f50
5d60: c1395d88 c1395d70 c007a628 c00f28ec c0eb2694 c029a080 c0eb2694 c1395dc0
5d80: c1395d8c c007c408 c007a4dc c0081dec 00000000 001a134f 001a1350 00000000
5da0: c029a080 c0f620e4 00000000 c1395f50 c0081dec c1395dec c1395dc4 c007c628
5dc0: c007c16c c1395e24 c029a080 00000000 c1395f50 c0f62180 00000000 c0f631ac
5de0: c1395dfc c1395df0 c0080e04 c007c5d4 c1395e98 c1395e00 c009e4b4 c0080df8
5e00: 0000000e 0000000e 00000000 00000000 0000000e 00000000 c0080dec ffffffff
5e20: 00000000 0000000e 00000000 c029a080 c029a8a0 c029a280 c0291980 c0291940
5e40: c0291960 c02904a0 c0290520 c0290720 c0290500 c0290580 c02905c0 c0290c00
5e60: c0291fc0 000281a8 00000000 00000000 c1395f50 c0f620e4 c02d2800 c0f62180
5e80: 00000000 c1395f50 c1394000 c1395ea8 c1395e9c c0080dc8 c009e2c4 c1395ebc
5ea0: c1395eac c005fa68 c0080dc0 00000004 c1395f04 c1395ec0 c009caec c005fa30
5ec0: c0092b64 00000000 00000000 00000000 00000000 00000000 00000000 c02d2800
5ee0: c0f620e4 c0f631ac c1395f50 00000000 0000df85 c1394000 c1395f30 c1395f08
5f00: c009d008 c009c980 c02d2800 c02d283c c1395f50 c0210634 00000000 00000000
5f20: 00000000 c1395f4c c1395f34 c009d274 c009ce30 c1395fa4 00000040 c1394000
5f40: c1395f94 c1395f50 c005feb4 c009d21c 00000000 00000000 00000000 00000400
5f60: 00000000 00000000 00000000 00000000 00000000 00000031 00000c7a 00000333
5f80: c1395fa4 c0211558 c1395fcc c1395f98 c0060784 c005fe20 c1382100 c005fe14
5fa0: 00000040 c1395fa4 c1395fa4 0000df27 00000000 c1394000 c02e9f38 c0060670
5fc0: c1395ff4 c1395fd0 c0049890 c006067c ffffffff ffffffff 00000000 00000000
5fe0: 00000000 00000000 00000000 c1395ff8 c0037fd8 c00497b4 00000000 00000000
Backtrace:
[<c002385c>] (__bug+0x0/0x58) from [<bf0a2fa4>]
(bitmap_startwrite+0x1a4/0x1ec [md_mod])
[<bf0a2e00>] (bitmap_startwrite+0x0/0x1ec [md_mod]) from [<bf0af6f8>]
(make_request+0x580/0x5e8 [raid1])
r8 = C0A6F420  r7 = C0A6F7A0  r6 = C0A27AA0  r5 = 00000000
r4 = C0A27AA8
[<bf0af178>] (make_request+0x0/0x5e8 [raid1]) from [<c00f0488>]
(generic_make_request+0x188/0x1a0)
[<c00f0300>] (generic_make_request+0x0/0x1a0) from [<c00f29bc>]
(submit_bio+0xdc/0x104)
r8 = C0F620E4  r7 = 00000000  r6 = 00000001  r5 = C0A6F7A0
r4 = C0EB2694
[<c00f28e0>] (submit_bio+0x0/0x104) from [<c007a628>] (submit_bh+0x158/0x188)
[<c007a4d0>] (submit_bh+0x0/0x188) from [<c007c408>]
(__block_write_full_page+0x2a8/0x468)
r6 = C0EB2694  r5 = C029A080  r4 = C0EB2694
[<c007c160>] (__block_write_full_page+0x0/0x468) from [<c007c628>]
(block_write_full_page+0x60/0xd4)
[<c007c5c8>] (block_write_full_page+0x0/0xd4) from [<c0080e04>]
(blkdev_writepage+0x18/0x20)
[<c0080dec>] (blkdev_writepage+0x0/0x20) from [<c009e4b4>]
(mpage_writepages+0x1fc/0x3dc)
[<c009e2b8>] (mpage_writepages+0x0/0x3dc) from [<c0080dc8>]
(generic_writepages+0x14/0x18)
[<c0080db4>] (generic_writepages+0x0/0x18) from [<c005fa68>]
(do_writepages+0x44/0x64)
[<c005fa24>] (do_writepages+0x0/0x64) from [<c009caec>]
(__writeback_single_inode+0x178/0x330)
r4 = 00000004
[<c009c974>] (__writeback_single_inode+0x0/0x330) from [<c009d008>]
(sync_sb_inodes+0x1e4/0x2b4)
[<c009ce24>] (sync_sb_inodes+0x0/0x2b4) from [<c009d274>]
(writeback_inodes+0x64/0xb0)
[<c009d210>] (writeback_inodes+0x0/0xb0) from [<c005feb4>]
(background_writeout+0xa0/0xdc)
r6 = C1394000  r5 = 00000040  r4 = C1395FA4
[<c005fe14>] (background_writeout+0x0/0xdc) from [<c0060784>]
(pdflush+0x114/0x1dc)
r5 = C0211558  r4 = C1395FA4
[<c0060670>] (pdflush+0x0/0x1dc) from [<c0049890>] (kthread+0xe8/0x128)
r7 = C0060670  r6 = C02E9F38  r5 = C1394000  r4 = 00000000
[<c00497a8>] (kthread+0x0/0x128) from [<c0037fd8>] (do_exit+0x0/0x840)
r7 = 00000000  r6 = 00000000  r5 = 00000000  r4 = 00000000
Code: eb004925 e59f0014 eb004923 e3a03000 (e5833000)


What is the reason for this error messages? Is it really a driver bug?
Or does this mean that I completely screwed up the superblock of
/dev/md1? Maybe both? Anyway, even if I screwed my raid, a kernel
should respond a bit more friendly, shouldn't it? Are there any people
on this list that have successfully setup a root-raid1 with a write
intent bitmap on an nslu2 with debian?

thanks for reading until the end of this mail,

Toon


--
Toon Verstraelen
Gustaaf Eylenboschplein 16
9000 Gent
Belgium



Reply to: