amazingly, debian server keeps crashing
debian intel 845 chipset pentium 4, 400 fsb.
3ware sata 4-hd raid 5.
we're having trouble finding evidence anywhere
that shows what's going south--
from "kern.log":
Apr 12 19:53:11 server kernel: Unable to handle kernel NULL pointer dereference at virtual address 0000008c
Apr 12 19:53:11 server kernel: printing eip:
Apr 12 19:53:11 server kernel: f8899f39
Apr 12 19:53:11 server kernel: *pde = 00000000
Apr 12 19:53:11 server kernel: Oops: 0000 [#1]
Apr 12 19:53:11 server kernel: PREEMPT
Apr 12 19:53:11 server kernel: Modules linked in: ipv6 genrtc dm_mod capability commoncap 3c59x usbkbd usbcore ext3 jbd mbcache sd_mod 3w_xxxx scsi_mod unix font vesafb cfbcopyarea cfbimgblt cfbfillrect
Apr 12 19:53:11 server kernel: CPU: 0
Apr 12 19:53:11 server kernel: EIP: 0060:[__crc_xfrm_state_alloc+4074054/4557196] Not tainted
Apr 12 19:53:11 server kernel: EFLAGS: 00010292 (2.6.8-2-686)
Apr 12 19:53:11 server kernel: EIP is at journal_blocks_per_page+0x9/0x20 [jbd]
Apr 12 19:53:11 server kernel: eax: 00000000 ebx: 00000000 ecx: 0000000c edx: f88e66a0
Apr 12 19:53:11 server kernel: esi: 00000000 edi: 00000000 ebp: c14da860 esp: e6e91d88
Apr 12 19:53:11 server kernel: ds: 007b es: 007b ss: 0068
Apr 12 19:53:11 server kernel: Process exim3 (pid: 18709, threadinfo=e6e90000 task=ec984130)
Apr 12 19:53:11 server kernel: Stack: f88cf533 00000000 00000231 f88cca7a 00000000 f6085314 400180c3 00000018
Apr 12 19:53:11 server kernel: f60853b0 c0135f9c f60853b0 00000018 00000231 00000018 c14da860 0000016e
Apr 12 19:53:11 server kernel: c0137d19 ee7f5680 c14da860 0000016e 00000231 c18e0230 00000000 00000065
Apr 12 19:53:11 server kernel: Call Trace:
Apr 12 19:53:11 server kernel: [__crc_xfrm_state_alloc+4292672/4557196] ext3_writepage_trans_blocks+0x13/0x80 [ext3]
Apr 12 19:53:11 server kernel: [__crc_xfrm_state_alloc+4281735/4557196] ext3_prepare_write+0x1a/0x140 [ext3]
Apr 12 19:53:11 server kernel: [find_lock_page+44/224] find_lock_page+0x2c/0xe0
Apr 12 19:53:11 server kernel: [generic_file_aio_write_nolock+969/2912] generic_file_aio_write_nolock+0x3c9/0xb60
Apr 12 19:53:11 server kernel: [do_anonymous_page+312/432] do_anonymous_page+0x138/0x1b0
Apr 12 19:53:11 server kernel: [do_no_page+96/848] do_no_page+0x60/0x350
Apr 12 19:53:11 server kernel: [generic_file_aio_write+120/176] generic_file_aio_write+0x78/0xb0
Apr 12 19:53:11 server kernel: [__crc_xfrm_state_alloc+4270833/4557196] ext3_file_write+0x44/0xd0 [ext3]
Apr 12 19:53:11 server kernel: [do_sync_write+128/176] do_sync_write+0x80/0xb0
Apr 12 19:53:11 server kernel: [sys_wait4+459/640] sys_wait4+0x1cb/0x280
Apr 12 19:53:11 server kernel: [vfs_write+237/352] vfs_write+0xed/0x160
Apr 12 19:53:11 server kernel: [sys_write+81/128] sys_write+0x51/0x80
Apr 12 19:53:11 server kernel: [syscall_call+7/11] syscall_call+0x7/0xb
Apr 12 19:53:11 server kernel: Code: 8b 80 8c 00 00 00 0f b6 40 14 29 c1 b8 01 00 00 00 d3 e0 c3
the items PRIOR to this set of entries is:
Apr 11 14:22:22 server kernel: Generic RTC Driver v1.07
Apr 11 14:22:22 server kernel: NET: Registered protocol family 10
Apr 11 14:22:22 server kernel: Disabled Privacy Extensions on device c02ff020(lo)
Apr 11 14:22:22 server kernel: IPv6 over IPv4 tunneling driver
Apr 11 14:22:33 server kernel: eth0: no IPv6 routers present
Apr 11 14:31:56 server kernel: 3w-xxxx: scsi0: AEN: INFO: Initialization started: Unit #0.
Apr 11 17:16:02 server kernel: 3w-xxxx: scsi0: AEN: INFO: Initialization complete: Unit #0.
there were NO entries in "kern.log" between 5:53pm 12 apr and 7:53pm 11 apr.
but "messages" shows:
Apr 13 05:42:24 server -- MARK --
Apr 13 06:02:24 server -- MARK --
Apr 13 06:22:24 server -- MARK --
Apr 13 06:25:03 server syslogd 1.4.1#16: restart.
Apr 13 06:42:24 server -- MARK --
Apr 13 07:02:24 server -- MARK --
Apr 18 14:27:05 server syslogd 1.4.1#16: restart.
so it apparently died between 7:02 and 7:22 on the 13th.
$ uname -a
Linux server 2.6.8-2-686 #1 Mon Jan 24 03:58:38 EST 2005 i686 GNU/Linux
# cat /proc/version
Linux version 2.6.8-2-686 (dilinger@toaster.hq.voxel.net) (gcc version 3.3.5 (Debian 1:3.3.5-6)) #1 Mon Jan 24 03:58:38 EST 2005
should we be using kernel 2.4? other pointers welcome...
# cat /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 15
model : 2
model name : Intel(R) Pentium(R) 4 CPU 2.80GHz
stepping : 9
cpu MHz : 2857.073
cache size : 512 KB
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 2
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe cid
bogomips : 5652.48
Reply to: