[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

System freeze after random program crashes associated with kernel oops



Hello,

I installed Linux again a couple of months ago, had been running Windows
for a while. Ever since that I have had a problem with random system
crashes. These crashes occur very randomly in time, the computer may
have been running for between 5 miuntes and 10 days. However, they most
often occur after the system has been running for 1-3 days. They appear
almost exclusively when I'm at the computer doing something; almost
never when the computer is idle. Often before the whole system locks up
various programs segmentation fault. Associated with these program
crashes klogd report kernel oopses. Check the bottom for a sample of
them. It seems to me the crashes occur more often during heavy disk
reads, but I am not really sure about that. Sometimes the scroll lock
and caps lock leds flash simultaneously when the system is in the
crashed state, but sometimes not. Until the system freezes everything
works perfect.

My system froze under Windows too, but it is hard to tell if it was the
same problem since I was using an application that was quite unstable
and many friends reported crashes when running it too.

Before switching to Linux I added one memory module and an additional
hard drive. However, I've been trying to run the system with either
one of the memory modules (in different slots too) but without any
difference. I have not tried running the system without the new hard
drive since I really need it.

I have tried running memtest86 on all combinations of memory (both
modules and either one of them), but memtest86 seems to freeze. I get to
a blue screen with a red bar at the top with the version number. A
cursor is blinking but nothing more happens. I left it in that state
over a night but no change occured.

I have searched the web and usenet for hours without finding much
useful.

I am extremely grateful for any help.

Kind regards,
Kristoffer Erlandsson

-----------------------------------------------------------------------

My system specs:
Debian unstable
Kernel 2.4.20
gcc 2.95.4 (other versions too, but the kernel was compiled with 2.95.4)
libc 2.3.1


Hardware:
AMD XP 1800+ at 1533 MHZ
2x 512MB DDR-DIMM PC2100 CL2.5 (only one inserted now due to testing)
Abit KR7A Motherboard using VIA VT8366A and VT8233 chipsets
Creative PCI SB 128 Vibra
Cnet Pro 10/100Mbit NIC using Davicom DM9102A chipset
Western Digital IDE ATA-100 60GB 7200RPM 2MB cache (WD600BB) (Running
one ntfs partition and one ext2 with the operating system in question
on)
Western Digital IDE ATA-100 120GB 7200RPM 8MB cache (WD1200JB) (Running
two fat32 partitions)
LG CD-ROM CRD-8522B 52X
Creative 3DBlaster GeForce4 MX440 64MB AGP (Running Nvidia's driver)


A sample of two oopses:
(Maybe worth noteing is that all oopses I have looked at end with
an instruction accessing %e*x, where * is a letter a-d. I do not know
what this means, but I thought it was worth mentioning for those more
skilled than me. I can produce more oopses on request, I do not want to
make this letter any longer.)
------------------------------------------------------------------------------
ksymoops 2.4.8 on i686 2.4.20.  Options used
     -v /usr/src/linux-2.4.20/vmlinux (specified)
     -k /proc/ksyms (default)
     -l /proc/modules (default)
     -o /lib/modules/2.4.20/ (default)
     -m /usr/src/linux-2.4.20/System.map (specified)

Jun  8 15:22:15 n14 kernel: Unable to handle kernel NULL pointer dereference at virtual address 00000004
Jun  8 15:22:15 n14 kernel: c01412bb
Jun  8 15:22:15 n14 kernel: *pde = 00000000
Jun  8 15:22:15 n14 kernel: Oops: 0000
Jun  8 15:22:15 n14 kernel: CPU:    0
Jun  8 15:22:15 n14 kernel: EIP:    0010:[prune_icache+43/212]    Tainted: P 
Jun  8 15:22:15 n14 kernel: EFLAGS: 00210217
Jun  8 15:22:15 n14 kernel: eax: c2cd22c0   ebx: 00000000   ecx: 00000000   edx: c000b840
Jun  8 15:22:15 n14 kernel: esi: c2cd21c0   edi: 00000000   ebp: c163bf60   esp: c163bf48
Jun  8 15:22:15 n14 kernel: ds: 0018   es: 0018   ss: 0018
Jun  8 15:22:15 n14 kernel: Process kswapd (pid: 4, stackpage=c163b000)
Jun  8 15:22:15 n14 kernel: Stack: 0000000c 000001d0 00000020 000005ea dc6430e8 d94185e8 00000006 c014137f 
Jun  8 15:22:15 n14 kernel:        00000f36 c0129717 00000006 000001d0 00000006 000001d0 00000006 00000020 
Jun  8 15:22:15 n14 kernel:        000001d0 c02720d4 c02720d4 c012976c 00000020 c02720d4 00000001 c163a000 
Jun  8 15:22:15 n14 kernel: Call Trace:    [shrink_icache_memory+27/48] [shrink_caches+111/136] [try_to_free_pages_zone+60/92] [kswapd_balance_pgdat+65/140] [kswapd_balance+26/48]
Jun  8 15:22:15 n14 kernel: Code: 8b 7f 04 8d 73 f8 8b 86 08 01 00 00 a8 38 75 e6 0b 86 c8 00 
Using defaults from ksymoops -t elf32-i386 -a i386


>>eax; c2cd22c0 <_end+29daf3c/205a1cdc>
>>esi; c2cd21c0 <_end+29dae3c/205a1cdc>
>>ebp; c163bf60 <_end+1344bdc/205a1cdc>
>>esp; c163bf48 <_end+1344bc4/205a1cdc>

Code;  00000000 Before first symbol
00000000 <_EIP>:
Code;  00000000 Before first symbol
   0:   8b 7f 04                  mov    0x4(%edi),%edi
Code;  00000003 Before first symbol
   3:   8d 73 f8                  lea    0xfffffff8(%ebx),%esi
Code;  00000006 Before first symbol
   6:   8b 86 08 01 00 00         mov    0x108(%esi),%eax
Code;  0000000c Before first symbol
   c:   a8 38                     test   $0x38,%al
Code;  0000000e Before first symbol
   e:   75 e6                     jne    fffffff6 <_EIP+0xfffffff6>
Code;  00000010 Before first symbol
  10:   0b 86 c8 00 00 00         or     0xc8(%esi),%eax
------------------------------------------------------------------------------

ksymoops 2.4.8 on i686 2.4.20.  Options used
     -v /usr/src/linux-2.4.20/vmlinux (specified)
     -k /proc/ksyms (default)
     -l /proc/modules (default)
     -o /lib/modules/2.4.20/ (default)
     -m /usr/src/linux-2.4.20/System.map (specified)

Jun  9 10:19:36 n14 kernel: Unable to handle kernel paging request at virtual address ef212e63
Jun  9 10:19:36 n14 kernel: c01289d8
Jun  9 10:19:36 n14 kernel: *pde = 00000000
Jun  9 10:19:36 n14 kernel: Oops: 0002
Jun  9 10:19:36 n14 kernel: CPU:    0
Jun  9 10:19:36 n14 kernel: EIP:    0010:[kmem_cache_reap+404/504]    Tainted: P 
Jun  9 10:19:36 n14 kernel: EFLAGS: 00010002
Jun  9 10:19:36 n14 kernel: eax: c160fe40   ebx: c160fe40   ecx: d35ed020   edx: ef212e63
Jun  9 10:19:36 n14 kernel: esi: 00000000   edi: 00001eab   ebp: 000008c4   esp: c163bf54
Jun  9 10:19:36 n14 kernel: ds: 0018   es: 0018   ss: 0018
Jun  9 10:19:36 n14 kernel: Process kswapd (pid: 4, stackpage=c163b000)
Jun  9 10:19:36 n14 kernel: Stack: 00000017 000001d0 00000017 00000003 c160fe30 c02d224c 00000000 00000f56 
Jun  9 10:19:36 n14 kernel:        00001889 c160fe30 c01296c4 00000003 00000017 000001d0 c02720d4 c02720d4 
Jun  9 10:19:36 n14 kernel:        c012976c 00000017 c02720d4 00000001 c163a000 00000000 c0129871 c0272020 
Jun  9 10:19:36 n14 kernel: Call Trace:    [shrink_caches+28/136] [try_to_free_pages_zone+60/92] [kswapd_balance_pgdat+65/140] [kswapd_balance+26/48] [kswapd+153/188]
Jun  9 10:19:36 n14 kernel: Code: 89 02 c7 01 00 00 00 00 c7 41 04 00 00 00 00 fb 51 8b 54 24 
Using defaults from ksymoops -t elf32-i386 -a i386


>>eax; c160fe40 <_end+1318abc/205a1cdc>
>>ebx; c160fe40 <_end+1318abc/205a1cdc>
>>ecx; d35ed020 <_end+132f5c9c/205a1cdc>
>>esp; c163bf54 <_end+1344bd0/205a1cdc>

Code;  00000000 Before first symbol
00000000 <_EIP>:
Code;  00000000 Before first symbol
   0:   89 02                     mov    %eax,(%edx)
Code;  00000002 Before first symbol
   2:   c7 01 00 00 00 00         movl   $0x0,(%ecx)
Code;  00000008 Before first symbol
   8:   c7 41 04 00 00 00 00      movl   $0x0,0x4(%ecx)
Code;  0000000f Before first symbol
   f:   fb                        sti    
Code;  00000010 Before first symbol
  10:   51                        push   %ecx
Code;  00000011 Before first symbol
  11:   8b 54 24 00               mov    0x0(%esp,1),%edx
-----------------------------------------------------------------------------

dmesg:

Linux version 2.4.20 (root@n14) (gcc version 2.95.4 20011002 (Debian prerelease)
) #4 Tue May 6 00:54:51 CEST 2003
BIOS-provided physical RAM map:
 BIOS-e820: 0000000000000000 - 000000000009fc00 (usable)
 BIOS-e820: 000000000009fc00 - 00000000000a0000 (reserved)
 BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved)
 BIOS-e820: 0000000000100000 - 000000001fff0000 (usable)
 BIOS-e820: 000000001fff0000 - 000000001fff3000 (ACPI NVS)
 BIOS-e820: 000000001fff3000 - 0000000020000000 (ACPI data)
 BIOS-e820: 00000000fec00000 - 00000000fec01000 (reserved)                       BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved)
 BIOS-e820: 00000000ffff0000 - 0000000100000000 (reserved)                      0MB HIGHMEM available.
511MB LOWMEM available.
On node 0 totalpages: 131056
zone(0): 4096 pages.
zone(1): 126960 pages.
zone(2): 0 pages.
Kernel command line: auto BOOT_IMAGE=Linux ro root=302
Found and enabled local APIC!
Initializing CPU#0
Detected 1534.019 MHz processor.
Console: colour VGA+ 80x25
Calibrating delay loop... 3060.53 BogoMIPS
Memory: 515624k/524224k available (1180k kernel code, 8212k reserved, 369k data,
 248k init, 0k highmem)
Dentry cache hash table entries: 65536 (order: 7, 524288 bytes)
Inode cache hash table entries: 32768 (order: 6, 262144 bytes)
Mount-cache hash table entries: 8192 (order: 4, 65536 bytes)
Buffer-cache hash table entries: 32768 (order: 5, 131072 bytes)
Page-cache hash table entries: 131072 (order: 7, 524288 bytes)
CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
CPU: L2 Cache: 256K (64 bytes/line)
Intel machine check architecture supported.
Intel machine check reporting enabled on CPU#0.
CPU:     After generic, caps: 0383fbff c1c3fbff 00000000 00000000
CPU:             Common caps: 0383fbff c1c3fbff 00000000 00000000
CPU: AMD Athlon(tm) XP 1800+ stepping 02
Enabling fast FPU save and restore... done.
Enabling unmasked SIMD FPU exception support... done.
Checking 'hlt' instruction... OK.
POSIX conformance testing by UNIFIX
enabled ExtINT on CPU#0
ESR value before enabling vector: 00000000
ESR value after enabling vector: 00000000
Using local APIC timer interrupts.
calibrating APIC timer ...
..... CPU clock speed is 1533.9509 MHz.
..... host bus clock speed is 266.7740 MHz.
cpu: 0, clocks: 2667740, slice: 1333870
CPU0<T0:2667728,T1:1333856,D:2,S:1333870,C:2667740>
mtrr: v1.40 (20010327) Richard Gooch (rgooch@atnf.csiro.au)
mtrr: detected mtrr type: Intel
PCI: PCI BIOS revision 2.10 entry at 0xfb4d0, last bus=1
PCI: Using configuration type 1
PCI: Probing PCI hardware
PCI: Using IRQ router default [1106/3099] at 00:00.0
Linux NET4.0 for Linux 2.4
Based upon Swansea University Computer Society NET3.039
Initializing RT netlink socket
Starting kswapd
VFS: Diskquotas version dquot_6.4.0 initialized
Installing knfsd (copyright (C) 1996 okir@monad.swb.de).
NTFS driver v1.1.22 [Flags: R/O]
pty: 256 Unix98 ptys configured
Serial driver version 5.05c (2001-07-08) with MANY_PORTS SHARE_IRQ SERIAL_PCI en
abled
ttyS00 at 0x03f8 (irq = 4) is a 16550A
ttyS01 at 0x02f8 (irq = 3) is a 16550A
Real Time Clock Driver v1.10e
Uniform Multi-Platform E-IDE driver Revision: 6.31
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
VP_IDE: IDE controller on PCI bus 00 dev 89
VP_IDE: detected chipset, but driver not compiled in!
PCI: No IRQ known for interrupt pin A of device 00:11.1. Please try using pci=bi
osirq.
VP_IDE: chipset revision 6
VP_IDE: not 100% native mode: will probe irqs later
    ide0: BM-DMA at 0xd800-0xd807, BIOS settings: hda:DMA, hdb:DMA
    ide1: BM-DMA at 0xd808-0xd80f, BIOS settings: hdc:DMA, hdd:pio
hda: WDC WD600BB-00CAA0, ATA DISK drive
hdb: LG CD-ROM CRD-8522B, ATAPI CD/DVD-ROM drive
hdc: WDC WD1200JB-00CRA1, ATA DISK drive
ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
ide1 at 0x170-0x177,0x376 on irq 15
hda: 117231408 sectors (60022 MB) w/2048KiB Cache, CHS=7297/255/63
hdc: 234441648 sectors (120034 MB) w/8192KiB Cache, CHS=232581/16/63
hdb: ATAPI 52X CD-ROM drive, 128kB Cache
Uniform CD-ROM driver Revision: 3.12
Partition check:
 hda: hda1 hda2 hda3
 hdc: hdc1 hdc2
Floppy drive(s): fd0 is 1.44M
FDC 0 is a post-1991 82077
dmfe: Davicom DM9xxx net driver, version 1.36.4 (2002-01-17)
eth0: Davicom DM9102 at pci00:0f.0, 00:08:a1:25:9c:c3, irq 10.
Linux agpgart interface v0.99 (c) Jeff Hartmann
agpgart: Maximum main memory to use for agp memory: 439M
agpgart: Detected Via Apollo Pro KT266 chipset
agpgart: AGP aperture is 64M @ 0xe0000000
es1371: version v0.30 time 00:55:34 May  6 2003
es1371: found chip, vendor id 0x1274 device id 0x5880 revision 0x02
es1371: found es1371 rev 2 at io 0xd000 irq 5
es1371: features: joystick 0x0
ac97_codec: AC97  codec, id: TRA35(TriTech TR A5)
usb.c: registered new driver usbdevfs
usb.c: registered new driver hub
uhci.c: USB Universal Host Controller Interface driver v1.1
uhci.c: USB UHCI at I/O 0xdc00, IRQ 11
usb.c: new USB bus registered, assigned bus number 1
hub.c: USB hub found
hub.c: 2 ports detected
uhci.c: USB UHCI at I/O 0xe400, IRQ 11
usb.c: new USB bus registered, assigned bus number 3
hub.c: USB hub found
hub.c: 2 ports detected
usb.c: registered new driver serial
usbserial.c: USB Serial Driver core v1.4
usbserial.c: USB Serial support registered for Handspring Visor / Palm 4.0 / Cli
é 4.x
usbserial.c: USB Serial support registered for Sony Clié 3.5
visor.c: USB HandSpring Visor, Palm m50x, Sony Clié driver v1.6
NET4: Linux TCP/IP 1.0 for NET4.0
IP Protocols: ICMP, UDP, TCP, IGMP
IP: routing cache hash table of 4096 buckets, 32Kbytes
TCP: Hash tables configured (established 32768 bind 32768)
NET4: Unix domain sockets 1.0/SMP for Linux NET4.0.
VFS: Mounted root (ext2 filesystem) readonly.
Freeing unused kernel memory: 248k freed
Adding Swap: 1060280k swap-space (priority -1)
0: nvidia: loading NVIDIA Linux x86 nvidia.o Kernel Module  1.0-4363  Sat Apr 19
 17:46:46 PDT 2003
blk: queue c02f08c4, I/O limit 4095Mb (mask 0xffffffff)
blk: queue c02f0c28, I/O limit 4095Mb (mask 0xffffffff)
spurious 8259A interrupt: IRQ7.



Reply to: