Fileserver Issues
I have a home-use fileserver running Etch and distro-supplied kernels,
software etc. It contains 2 x 4 drive raid5 arrays using mdadm. What I
initially thought was a samba issue led to a few kernel panics and
some "kernel bug" log messages. At first I had the bug messages when
running 2.6.18-4 so I updated to 2.6.18-5 where I got different bug
messages and the machine degraded to the point whereby it would kernel
panic during the boot process. I reverted back to an old 2.6.17 kernel
where everything appeared to boot and work fine, my arrays both began
to resync. After a couple of hours the first (4 x 320gb) array had
completed its resync and I noticed the other array (4 x 500gb) was
"stuck" midway through the resync after one of the "kernel bug" log
messages had been printed to the console. Any subsequent process
trying to query the mounted filesystem went into an uninterpretable
sleep and mdadm would not respond to commands citing an I/O error.
I booted into knoppix (livecd) and constructed the array for which
dmesg told me a specific drive it was having issues with. I booted
into the manufacturers hardware testing util (seagate) and ran the
short and extended tests, both which passed fine. I zeroed the drive,
booted back into Debian, created a partition on the device and (after
some mdadm hiccups) managed to re-add the old drive to the array.
During the re-sync I got a kernel panic, then another 2 after
rebooting. Finally, I removed the suspect device from the array before
it spent too long resyncing and all is well. The degraded array is
mounted and working fine. I erased and created an ext3 partition on
the suspect drive and data is being copied to it as I type. I don't
think this is a hardware problem, issues only happen if I add this
drive to the array and leave it to resync for a couple of minutes. In
the course of all this I tried a backported kernel and copy of mdadm
too. I have run CPU/Memory stability testing programs and I should
also note that the machine has been running with no issues for about 8
months.
What should I try from here short of purchasing new hardware? I've
included some of the different messages from my kernel log if they are
of any use.
Thanks,
Nathan
Aug 30 20:22:13 localhost kernel: ------------[ cut here ]------------
Aug 30 20:22:13 localhost kernel: kernel BUG at mm/slab.c:3434!
Aug 30 20:22:13 localhost kernel: invalid opcode: 0000 [#1]
Aug 30 20:22:13 localhost kernel: Modules linked in: ipv6 button ac
battery raid456 xor md_mod dm_snapshot dm_mirror dm_mod sbp2 loop
evdev snd_mpu401 snd_mp
u401_uart snd_rawmidi snd_seq_device snd rtc parport_pc parport
serio_raw floppy analog gameport pcspkr soundcore psmouse i2c_nforce2
i2c_core eth1394 ext3 j
bd ide_cd cdrom ide_disk sd_mod amd74xx generic ide_core ohci1394 skge
ieee1394 sata_sil sata_nv ehci_hcd ohci_hcd forcedeth libata scsi_mod
usbcore thermal
processor fan
Aug 30 20:22:13 localhost kernel: CPU: 0
Aug 30 20:22:13 localhost kernel: EIP: 0060:[<c0146014>] Not tainted VLI
Aug 30 20:22:13 localhost kernel: EFLAGS: 00010206 (2.6.18-4-486 #1)
Aug 30 20:22:13 localhost kernel: EIP is at kmem_cache_free+0x36/0x62
Aug 30 20:22:13 localhost kernel: eax: 80000080 ebx: d3b50dc0 ecx:
dff5a0c0 edx: c1474fe0
Aug 30 20:22:13 localhost kernel: esi: d09d6f74 edi: e3a7f4e4 ebp:
f6c6b8c0 esp: f7c85f2c
Aug 30 20:22:13 localhost kernel: ds: 007b es: 007b ss: 0068
Aug 30 20:22:13 localhost kernel: Process kjournald (pid: 2269,
ti=f7c84000 task=dfbab030 task.ti=f7c84000)
Aug 30 20:22:13 localhost kernel: Stack: d3b50dc0 d09d6f74 e3a7f4e4
f8966a9b 00000000 f6ab0800 00000000 00000000
Aug 30 20:22:13 localhost kernel: d048815c f7da83c0 dfbab030
c0360454 f6ab0800 00000000 00000000 00000046
Aug 30 20:22:13 localhost kernel: 00000000 0000000a f7aca030
25df3858 0002b97b 00007e83 dfbab140 f6c6b910
Aug 30 20:22:13 localhost kernel: Call Trace:
Aug 30 20:22:13 localhost kernel: [<f8966a9b>]
journal_commit_transaction+0x30b/0xc08 [jbd]
Aug 30 20:22:13 localhost kernel: [<f8969ca1>] kjournald+0x92/0x184 [jbd]
Aug 30 20:22:13 localhost kernel: [<c0122cc3>]
autoremove_wake_function+0x0/0x2d
Aug 30 20:22:13 localhost kernel: [<f8969c0f>] kjournald+0x0/0x184 [jbd]
Aug 30 20:22:13 localhost kernel: [<c0122b64>] kthread+0xaf/0xdb
Aug 30 20:22:13 localhost kernel: [<c0122ab5>] kthread+0x0/0xdb
Aug 30 20:22:13 localhost kernel: [<c0101005>] kernel_thread_helper+0x5/0xb
Aug 30 20:22:13 localhost kernel: Code: 00 40 c1 ea 0c c1 e2 05 03 15
5c b4 36 c0 8b 02 f6 c4 40 74 03 8b 52 0c 8b 02 84 c0 78 08 0f 0b 53
02 f9 06 29 c0 39
4a 18 74 08 <0f> 0b 6a 0d f9 06 29 c0 9c 5e fa 8b 19 8b 03 3b 43 04 72 0b 89
Aug 30 20:22:13 localhost kernel: EIP: [<c0146014>]
kmem_cache_free+0x36/0x62 SS:ESP 0068:f7c85f2c
Aug 30 22:27:51 localhost kernel: BUG: unable to handle kernel NULL
pointer dereference at virtual address 00000014
Aug 30 22:27:51 localhost kernel: printing eip:
Aug 30 22:27:51 localhost kernel: f8ba79e0
Aug 30 22:27:51 localhost kernel: *pde = 00000000
Aug 30 22:27:51 localhost kernel: Oops: 0000 [#1]
Aug 30 22:27:51 localhost kernel: SMP
Aug 30 22:27:51 localhost kernel: Modules linked in: ipv6 button ac
battery raid456 xor md_mod dm_snapshot dm_mirror dm_mod sbp2 loop
analog snd_mpu401 snd_m
pu401_uart snd_rawmidi snd_seq_device snd floppy parport_pc parport
rtc gameport soundcore serio_raw psmouse pcspkr i2c_nforce2 i2c_core
eth1394 evdev ext3 j
bd mbcache ide_cd cdrom ide_disk sd_mod generic amd74xx ide_core
ohci1394 skge ieee1394 sata_sil ohci_hcd forcedeth sata_nv ehci_hcd
libata scsi_mod usbcore
thermal processor fan
Aug 30 22:27:51 localhost kernel: CPU: 0
Aug 30 22:27:51 localhost kernel: EIP: 0060:[<f8ba79e0>] Not tainted VLI
Aug 30 22:27:51 localhost kernel: EFLAGS: 00010202 (2.6.18-5-686 #1)
Aug 30 22:27:51 localhost kernel: EIP is at
handle_stripe+0x114d/0x2075 [raid456]
Aug 30 22:27:51 localhost kernel: eax: 25e61e40 ebx: f71b4e80 ecx:
0000000c edx: 00000000
Aug 30 22:27:51 localhost kernel: esi: f71b4e84 edi: 00000010 ebp:
f71b4d78 esp: f756be90
Aug 30 22:27:51 localhost kernel: ds: 007b es: 007b ss: 0068
Aug 30 22:27:51 localhost kernel: Process md0_raid5 (pid: 2273,
ti=f756a000 task=dff17aa0 task.ti=f756a000)
Aug 30 22:27:51 localhost kernel: Stack: f756beb0 00000040 ffffa138
f71b4e80 c18079a0 c1807980 ffffa138 00000000
Aug 30 22:27:51 localhost kernel: c1909980 00000001 c030cf58
0000000a 00000000 c0121838 00000046 f756bef0
Aug 30 22:27:51 localhost kernel: dffdb550 00000046 00000046
00000032 c01050ea 00803040 f7407900 c01036b6
Aug 30 22:27:51 localhost kernel: Call Trace:
Aug 30 22:27:51 localhost kernel: [<c0121838>] __do_softirq+0x5a/0xbb
Aug 30 22:27:51 localhost kernel: [<c01050ea>] do_IRQ+0x48/0x52
Aug 30 22:27:51 localhost kernel: [<c01036b6>] common_interrupt+0x1a/0x20
Aug 30 22:27:51 localhost kernel: [<c01af0ae>] generic_unplug_device+0x15/0x22
Aug 30 22:27:51 localhost kernel: [<f8ba8a15>] raid5d+0x10d/0x132 [raid456]
Aug 30 22:27:51 localhost kernel: [<f8b77769>] md_thread+0xd7/0xed [md_mod]
Aug 30 22:27:51 localhost kernel: [<c012d92d>]
autoremove_wake_function+0x0/0x2d
Aug 30 22:27:51 localhost kernel: [<f8b77692>] md_thread+0x0/0xed [md_mod]
Aug 30 22:27:51 localhost kernel: [<c012d85f>] kthread+0xc2/0xef
Aug 30 22:27:51 localhost kernel: [<c012d79d>] kthread+0x0/0xef
Aug 30 22:27:51 localhost kernel: [<c0101005>] kernel_thread_helper+0x5/0xb
Aug 30 22:27:51 localhost kernel: Code: 11 8b 8c 24 94 00 00 00 89 4f
08 89 bc 24 94 00 00 00 b0 01 8b 7c 24 20 86 87 d0 00 00 00 fb 89 df
85 ff 74 23 8b 46
60 8b 56 64 <8b> 5f 04 8b 0f 83 c0 08 83 d2 00 39 d3 0f 82 68 ff ff ff 77 08
Aug 30 22:27:51 localhost kernel: EIP: [<f8ba79e0>]
handle_stripe+0x114d/0x2075 [raid456] SS:ESP 0068:f756be90
Aug 31 00:51:57 localhost kernel: BUG: unable to handle kernel paging
request at virtual address 7a95c000
Aug 31 00:51:57 localhost kernel: printing eip:
Aug 31 00:51:57 localhost kernel: f8b347ce
Aug 31 00:51:57 localhost kernel: *pde = 00000000
Aug 31 00:51:57 localhost kernel: Oops: 0000 [#1]
Aug 31 00:51:57 localhost kernel: Modules linked in: ipv6 button ac
battery raid456 xor md_mod dm_snapshot dm_mirror dm_mod sbp2 loop
snd_mpu401 snd_mpu401_u
art snd_rawmidi snd_seq_device snd rtc analog gameport soundcore
parport_pc parport serio_raw psmouse floppy pcspkr i2c_nforce2
i2c_core eth1394 evdev ext3 j
bd ide_cd cdrom ide_disk sd_mod generic amd74xx ide_core forcedeth
ohci1394 skge ieee1394 sata_sil sata_nv ehci_hcd ohci_hcd libata
scsi_mod usbcore thermal
processor fan
Aug 31 00:51:57 localhost kernel: CPU: 0
Aug 31 00:51:57 localhost kernel: EIP: 0060:[<f8b347ce>] Not tainted VLI
Aug 31 00:51:57 localhost kernel: EFLAGS: 00010212 (2.6.18-4-486 #1)
Aug 31 00:51:57 localhost kernel: EIP is at xor_sse_5+0x5b/0x3b5 [xor]
Aug 31 00:51:57 localhost kernel: eax: 00000010 ebx: f6a36000 ecx:
f6a39000 edx: 7a95c000
Aug 31 00:51:57 localhost kernel: esi: f6a37000 edi: f6a38000 ebp:
f7c01dd8 esp: f7c01dd4
Aug 31 00:51:57 localhost kernel: ds: 007b es: 007b ss: 0068
Aug 31 00:51:57 localhost kernel: Process md1_raid5 (pid: 2223,
ti=f7c00000 task=f7ad4ab0 task.ti=f7c00000)
Aug 31 00:51:57 localhost kernel: Stack: 8005003b 00000000 00000000
00000000 00000000 00000000 00000000 00000000
Aug 31 00:51:57 localhost kernel: 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000
Aug 31 00:51:57 localhost kernel: 00000000 f8b3745c f6a37000
7a95c000 00001000 f8b3521a f6a38000 f6a37000
Aug 31 00:51:57 localhost kernel: Call Trace:
Aug 31 00:51:57 localhost kernel: [<f8b3521a>] xor_block+0x74/0x7d [xor]
Aug 31 00:51:57 localhost kernel: [<f8b922d0>]
compute_parity5+0x311/0x3d6 [raid456]
Aug 31 00:51:57 localhost kernel: [<f8b94fc1>]
handle_stripe+0x18ee/0x1ebe [raid456]
Aug 31 00:51:57 localhost kernel: [<f8834fb2>]
scsi_io_completion+0x13f/0x2e9 [scsi_mod]
Aug 31 00:51:57 localhost kernel: [<f888bf25>]
ata_hsm_move+0x63d/0x653 [libata]
Aug 31 00:51:57 localhost kernel: [<f89303c0>] sd_rw_intr+0x1f7/0x221 [sd_mod]
Aug 31 00:51:57 localhost kernel: [<c0275964>] schedule+0x46e/0x4d2
Aug 31 00:51:57 localhost kernel: [<f8b9566f>] raid5d+0xde/0xf8 [raid456]
Aug 31 00:51:57 localhost kernel: [<f8b6659d>] md_thread+0xd6/0xec [md_mod]
Aug 31 00:51:57 localhost kernel: [<c0122cc3>]
autoremove_wake_function+0x0/0x2d
Aug 31 00:51:57 localhost kernel: [<f8b664c7>] md_thread+0x0/0xec [md_mod]
Aug 31 00:51:57 localhost kernel: [<c0122b64>] kthread+0xaf/0xdb
Aug 31 00:51:57 localhost kernel: [<c0122ab5>] kthread+0x0/0xdb
Aug 31 00:51:57 localhost kernel: [<c0101005>] kernel_thread_helper+0x5/0xb
Aug 31 00:51:57 localhost kernel: Code: 5d 30 0f 18 82 00 01 00 00 0f
18 82 20 01 00 00 8d b6 00 00 00 00 8d bc 27 00 00 00 00 0f 18 81 00
01 00 00 0f 18 81
20 01 00 00 <0f> 28 02 0f 28 4a 10 0f 28 52 20 0f 28 5a 30 0f 18 87 00 01 00
Aug 31 00:51:57 localhost kernel: EIP: [<f8b347ce>]
xor_sse_5+0x5b/0x3b5 [xor] SS:ESP 0068:f7c01dd4
Reply to: