[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

How to investigate kernel failure?



Not sure if this is OT, just hope someone can help.
I experienced a kernel crash last night at cron.daily time.
As i'm totally new to this kind of things i'd like to know where to start from? This machine, which run quite fine for over a year a 2.4.18 kernel (build from vanilla source taken from kernel.org), was upgraded to 2.4.22 (again from kernel.org) around 1 month ago.


The 1st block is this:

Oct 17 04:48:38 fserv kernel:  printing eip:
Oct 17 04:48:38 fserv kernel: c0135158
Oct 17 04:48:38 fserv kernel: Oops: 0000
Oct 17 04:48:38 fserv kernel: CPU:    0
Oct 17 04:48:38 fserv kernel: EIP: 0010:[get_hash_table+104/144] Not tainted
Oct 17 04:48:38 fserv kernel: EFLAGS: 00010202
Oct 17 04:48:38 fserv kernel: eax: dffc0000 ebx: 00000003 ecx: 403de5b0 edx: 403de5b0 Oct 17 04:48:38 fserv kernel: esi: 00000009 edi: 00000901 ebp: 00010e9e esp: ca897df0
Oct 17 04:48:38 fserv kernel: ds: 0018   es: 0018   ss: 0018
Oct 17 04:48:38 fserv kernel: Process find (pid: 20377, stackpage=ca897000)
Oct 17 04:48:38 fserv kernel: Stack: 00000000 00000901 00001000 00010e9e 00000ab8 c01356b9 00000901 00010e9e Oct 17 04:48:38 fserv kernel: 00001000 00000000 00000000 00000000 00000000 c0158dc9 00000901 00010e9e Oct 17 04:48:38 fserv kernel: 00001000 00000000 ca897f18 d75b1b80 00000000 00000000 ce204a00 00000000 Oct 17 04:48:38 fserv kernel: Call Trace: [getblk+25/80] [ext3_getblk+185/624] [vc_resize+289/1168] [ext3_find_entry+501/768] [ext3_bread+35/128] Oct 17 04:48:38 fserv kernel: [ext3_readdir+150/912] [permission+42/48] [vfs_readdir+97/144] [filldir64+0/368] [sys_getdents64+79/259] [filldir64+0/368]
Oct 17 04:48:38 fserv kernel:   [sys_fcntl64+128/144] [system_call+51/56]
Oct 17 04:48:38 fserv kernel:
Oct 17 04:48:38 fserv kernel: Code: 39 6a 04 75 f3 0f b7 42 08 3b 44 24 20 75 e9 66 39 7a 0c 75
Oct 17 04:48:38 fserv syslogd 1.4.1#10: restart.
Oct 17 04:48:50 fserv kernel: <1>Unable to handle kernel paging request at virtual address 403de5b4
Oct 17 04:48:50 fserv kernel:  printing eip:
Oct 17 04:48:50 fserv kernel: c0135158
Oct 17 04:48:50 fserv kernel: Oops: 0000
Oct 17 04:48:50 fserv kernel: CPU:    0
Oct 17 04:48:50 fserv kernel: EIP: 0010:[get_hash_table+104/144] Not tainted
Oct 17 04:48:50 fserv kernel: EFLAGS: 00010202
Oct 17 04:48:50 fserv kernel: eax: dffc0000 ebx: 00000003 ecx: 403de5b0 edx: 403de5b0 Oct 17 04:48:50 fserv kernel: esi: 00000009 edi: 00000901 ebp: 001e1941 esp: d061fdf0
Oct 17 04:48:50 fserv kernel: ds: 0018   es: 0018   ss: 0018
Oct 17 04:48:50 fserv kernel: Process tar (pid: 20504, stackpage=d061f000)
Oct 17 04:48:50 fserv kernel: Stack: 00000000 00000901 00001000 001e1941 00000ab8 c01356b9 00000901 001e1941 Oct 17 04:48:50 fserv kernel: 00001000 00000000 00000000 00000000 00000000 c0158dc9 00000901 001e1941 Oct 17 04:48:50 fserv kernel: 00001000 00000000 d061ff18 d3ba4480 00000000 00000000 caf4c3c0 00000000 Oct 17 04:48:50 fserv kernel: Call Trace: [getblk+25/80] [ext3_getblk+185/624] [vc_resize+289/1168] [ext3_find_entry+501/768] [ext3_bread+35/128] Oct 17 04:48:50 fserv kernel: [ext3_readdir+150/912] [vfs_permission+121/256] [permission+42/48] [vfs_readdir+97/144] [filldir64+0/368] [sys_getdents64+79/259] Oct 17 04:48:50 fserv kernel: [filldir64+0/368] [sys_fcntl64+128/144] [system_call+51/56]
Oct 17 04:48:50 fserv kernel:
Oct 17 04:48:50 fserv kernel: Code: 39 6a 04 75 f3 0f b7 42 08 3b 44 24 20 75 e9 66 39 7a 0c 75

At this time most (possibly all) services were still alive, according to the logs.

Then a second crash around 3 hours later:

Oct 17 08:13:58 fserv kernel: <1>Unable to handle kernel paging request at virtual address 403de5b4
Oct 17 08:13:58 fserv kernel:  printing eip:
Oct 17 08:13:58 fserv kernel: c0135158
Oct 17 08:13:58 fserv kernel: Oops: 0000
Oct 17 08:13:58 fserv kernel: CPU:    0
Oct 17 08:13:58 fserv kernel: EIP: 0010:[get_hash_table+104/144] Not tainted
Oct 17 08:13:58 fserv kernel: EFLAGS: 00010202
Oct 17 08:13:58 fserv kernel: eax: dffc0000 ebx: 00000003 ecx: 403de5b0 edx: 403de5b0 Oct 17 08:13:58 fserv kernel: esi: 00000009 edi: 00000801 ebp: 000016df esp: df445e30
Oct 17 08:13:58 fserv kernel: ds: 0018   es: 0018   ss: 0018
Oct 17 08:13:58 fserv kernel: Process kjournald (pid: 16, stackpage=df445000) Oct 17 08:13:58 fserv kernel: Stack: dfd9a800 00000801 00001000 000016df 00000ab8 c01356b9 00000801 000016df Oct 17 08:13:58 fserv kernel: 00001000 dfd9a800 cae65390 00000000 d9eceec0 c0164cb9 00000801 000016df Oct 17 08:13:58 fserv kernel: 00001000 dfd9a800 cae65390 000016df c01621ad dfd9a800 dfd9a850 dfd9a800 Oct 17 08:13:58 fserv kernel: Call Trace: [getblk+25/80] [journal_get_descriptor_buffer+57/112] [journal_commit_transaction+1373/3799] [schedule+758/800] [kjournald+278/448] Oct 17 08:13:58 fserv kernel: [commit_timeout+0/16] [arch_kernel_thread+40/64]
Oct 17 08:13:58 fserv kernel:
Oct 17 08:13:58 fserv kernel: Code: 39 6a 04 75 f3 0f b7 42 08 3b 44 24 20 75 e9 66 39 7a 0c 75


At this point the server was defently dead, only replaying to the ping.

So where do i start from?
Thanks



Reply to: