multiple crashes: postmortem analysis
Hi,
I'm running potato
Linux cerber 2.2.19 #1 Sat Jun 9 13:04:06 EST 2001 i686 unknown
and I'm experiencing multiple very bad crashes. Everything started when
I was uploading a big file on the server from a laptop connect the same
10Meg hub.
The crash looked as follow :
Call Trace: [<c012a795>] [<c011f36f>] [<c01241b7>] [<c026b2e4>]
[<c0124287>] [<c010600°>] [<c0108c2b>]
Code: 89 u0 34 c7 01 00 00 00 00 89 02 c7 41 34 00 00 00 00 ff 0d
Unable to handle kernel NULL pointer dereference at virtual address
00000020
current->tss.cr3 = 0a2f8000, %cr3 = 0a2f8000
*pde = 00000000
Oops: 0000
CPU: 0
EIP: 0010:[<c012a770>]
EFLAGS: 00010203
eax: 00000005 ebx: c0515460 ecx: 00000005 edx: 0001ffd8
esi: 00000000 edi: cc01b130 ebp: c0515460 esp: ca457d98
ds: 0018 es: 0018 ss: 0018
Process wu-ftpd (pid: 307, process nr: 41, stackpage=ca457000)
Stack: ca456000 00000005 c011f36f c0515460 00000005 0000001b 00000005
00000005
ca456000 c01241b7 00000005 00000005 ca456000 00000005 00000003
00070302
c0124c2c 00000005 00000000 00001000 00000003 00070302 c071c900
00362400
Call Trace: [<c011f36f>] [<c01241b7>] [<c0124c2c>] [<c012a648>]
[<c012988d>] [<c0129a46>] [<c01452ae>]
[<c01457f0>] [<c014599f>] [<c0143da0>] [<c0143b48>] [<c0158809>]
[<c015891a>] [<c012812c>] [<c010a1e0>]
Code: 83 7e 20 00 75 62 f6 46 18 46 75 5c 8b 76 14 39 fe 75 ed 90
The server was used as a router/firewall connected to a cable modem. The
installation was not very stable and I rebooted a few times over the
last month so
I cannot say for sure that the problem wasn't there before.
Whe nI rebooted it, I got some file system corruption ( I have 7
partitions for <root> /var /usr /tmp /home /opt <swap> ). The problem
couldn't be corrected automatiquely and I went in single mode to run
fsck manually.
This lead to another crash which is perfectly reproducible. I first lost
my /opt partition and after rebooting a few time, the /usr started to
give the same problem. This crash looks as follows :
Parallelizing fsck version 1.18 (11-Nov-1999)
e2fsck 1.18, 11-Nov-1999 for EXT2 FS 0.5b, 95/08/09
/dev/hda7 contains a file system with errors, check forced.
Pass 1: Checking inodes, blocks, and sizes
Unable to handle kernel paging request at virtual address 245c8b53
current->tss.cr3 = 0d269000, %cr3 = 0d269000
*pde = 00000000
Oops: 0000
CPU: 0
EIP: 0010:[<c0129592>]
EFLAGS: 00010206
eax: 245c8b53 ebx: 0100ce40 ecx: 00012060 edx: 245c8b53
esi: 00000307 edi: 00040339 ebp: 00000400 esp: cd1dbce4
ds: 0018 es: 0018 ss: 0018
Process fsck.ext2 (pid: 395, process nr: 49, stackpage=cd1db000)
Stack: cd1dbebc cd1d0307 cd1dbea8 c010aff8 ccdb2c00 00367e00 0806acf0
c01295c4
00000307 00040339 00000400 00000000 c012993e 00000307 00040339
00000400
00000000 00000001 cd1dbebc cd1dbebc cca01950 c012cb46 00000307
00040339
Call Trace: [<c010aff8>] [<c01295c4>] [<c012993e>] [<c012cb46>]
[<c011dbee>] [<c019b892>] [<c019ce50>]
[<c011dc32>] [<c011371f>] [<c01137f2>] [<c0127dc4>] [<c0128029>]
[<c010a1e0>]
Code: 8b 12 39 78 04 75 f3 39 68 08 75 ee 66 39 70 0c 75 e8 89 c2
Warning... fsck.ext2 for device /dev/hda7 exited with signal 11.
the versions used are shown here :
cerber:/# fsck --version
Parallelizing fsck version 1.18 (11-Nov-1999)
cerber:/# wu-ftpd -V
Version wu-2.6.0(1) Thu Feb 8 17:45:47 CET 2001
I can forward all the logs but I've allready tried to send it yesterday (
log.tgz) and it didn't come through. The server no longer
exists, I remove the HD yesterday and put it on another pc and
reinstall. The problem reappeared as soon as I put it back in place.
I suppose this is a H/W problem, do you have any idea on the root cause
of this ?
Thanks in advance for your help,
Vincent Guffens
Reply to: