[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

multiple crashes: postmortem analysis



Hi,

I'm running potato
Linux cerber 2.2.19 #1 Sat Jun 9 13:04:06 EST 2001 i686 unknown

and I'm experiencing multiple very bad crashes. Everything started when
I was uploading a big file on the server from a laptop connect the same
10Meg hub.

The crash looked as follow :

Call Trace: [<c012a795>] [<c011f36f>] [<c01241b7>] [<c026b2e4>]
[<c0124287>] [<c010600°>] [<c0108c2b>]
Code: 89 u0 34 c7 01 00 00 00 00 89 02 c7 41 34 00 00 00 00 ff 0d
Unable to handle kernel NULL pointer dereference at virtual address
00000020
current->tss.cr3 = 0a2f8000, %cr3 = 0a2f8000
*pde = 00000000
Oops: 0000
CPU:    0
EIP:    0010:[<c012a770>]
EFLAGS: 00010203
eax: 00000005   ebx: c0515460   ecx: 00000005   edx: 0001ffd8
esi: 00000000   edi: cc01b130   ebp: c0515460   esp: ca457d98
ds: 0018   es: 0018   ss: 0018
Process wu-ftpd (pid: 307, process nr: 41, stackpage=ca457000)
Stack: ca456000 00000005 c011f36f c0515460 00000005 0000001b 00000005
00000005
       ca456000 c01241b7 00000005 00000005 ca456000 00000005 00000003
00070302
       c0124c2c 00000005 00000000 00001000 00000003 00070302 c071c900
00362400
Call Trace: [<c011f36f>] [<c01241b7>] [<c0124c2c>] [<c012a648>]
[<c012988d>] [<c0129a46>] [<c01452ae>]
       [<c01457f0>] [<c014599f>] [<c0143da0>] [<c0143b48>] [<c0158809>]
[<c015891a>] [<c012812c>] [<c010a1e0>]
Code: 83 7e 20 00 75 62 f6 46 18 46 75 5c 8b 76 14 39 fe 75 ed 90




The server was used as a router/firewall connected to a cable modem. The
installation was not very stable and I rebooted a few times over the
last month so
I cannot say for sure that the problem wasn't there before.

Whe nI rebooted it, I got some file system corruption ( I have 7
partitions for <root> /var /usr /tmp /home /opt <swap> ). The problem
couldn't be corrected automatiquely and I went in single mode to run
fsck manually.

This lead to another crash which is perfectly reproducible. I first lost
my /opt partition and after rebooting a few time, the /usr started to
give the same problem. This crash looks as follows :

Parallelizing fsck version 1.18 (11-Nov-1999)
e2fsck 1.18, 11-Nov-1999 for EXT2 FS 0.5b, 95/08/09
/dev/hda7 contains a file system with errors, check forced.
Pass 1: Checking inodes, blocks, and sizes
Unable to handle kernel paging request at virtual address 245c8b53
current->tss.cr3 = 0d269000, %cr3 = 0d269000
*pde = 00000000
Oops: 0000
CPU:    0
EIP:    0010:[<c0129592>]
EFLAGS: 00010206
eax: 245c8b53   ebx: 0100ce40   ecx: 00012060   edx: 245c8b53
esi: 00000307   edi: 00040339   ebp: 00000400   esp: cd1dbce4
ds: 0018   es: 0018   ss: 0018
Process fsck.ext2 (pid: 395, process nr: 49, stackpage=cd1db000)
Stack: cd1dbebc cd1d0307 cd1dbea8 c010aff8 ccdb2c00 00367e00 0806acf0
c01295c4
       00000307 00040339 00000400 00000000 c012993e 00000307 00040339
00000400
       00000000 00000001 cd1dbebc cd1dbebc cca01950 c012cb46 00000307
00040339
Call Trace: [<c010aff8>] [<c01295c4>] [<c012993e>] [<c012cb46>]
[<c011dbee>] [<c019b892>] [<c019ce50>]
       [<c011dc32>] [<c011371f>] [<c01137f2>] [<c0127dc4>] [<c0128029>]
[<c010a1e0>]
Code: 8b 12 39 78 04 75 f3 39 68 08 75 ee 66 39 70 0c 75 e8 89 c2
Warning... fsck.ext2 for device /dev/hda7 exited with signal 11.


the versions used are shown here :

cerber:/# fsck --version
Parallelizing fsck version 1.18 (11-Nov-1999)

cerber:/# wu-ftpd -V

  Version wu-2.6.0(1) Thu Feb 8 17:45:47 CET 2001



I can forward all the logs but I've allready tried to send it yesterday (
log.tgz) and it didn't come through. The server no longer
exists, I remove the HD yesterday and put it on another pc and
reinstall. The problem reappeared as soon as I put it back in place.

I suppose this is a H/W problem, do you have any idea on the root cause
of this ?

Thanks in advance for your help,
           Vincent Guffens



Reply to: