[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: SGI O2 - Oops



Hi,

* On 2007-03-09 20:50 <ths@networkno.de> wrote:

> A function at 0xffffffff8006fb3c in the kernel passed a bad pointer
> (0x000000002abd8498) in the call to find_get_page. This looks like it is
> a kernel bug. Your System.map file can tell you what function that was,
> this helps probably a bit further.

Thanks for this clarification. I have neither addresses in
/boot/System.map-2.6.18:

$ grep 8006fb3c /boot/System.map-2.6.18
$ grep 2abd8498 /boot/System.map-2.6.18
$

Or am I searching at the wrong place?

On Friday the machine looped with another Oops (I could only see this on
the serial console) and did not respond to any network/console logins:

Mem-info:
DMA per-cpu:
cpu 0 hot: high 186, batch 31 used:20
cpu 0 cold: high 62, batch 15 used:56
DMA32 per-cpu: empty
Normal per-cpu: empty
HighMem per-cpu: empty
Free pages:        2708kB (0kB HighMem)
Active:98243 inactive:118124 dirty:46 writeback:0 unstable:0 free:677 slab:31417 mapped:7733 pagetables:1319
DMA free:2708kB min:5792kB low:7240kB high:8688kB active:392972kB inactive:472496kB present:2097152kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 0 0
DMA32 free:0kB min:0kB low:0kB high:0kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 0 0
Normal free:0kB min:0kB low:0kB high:0kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 0 0
HighMem free:0kB min:128kB low:128kB high:128kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 0 0
DMA: 405*4kB 0*8kB 0*16kB 0*32kB 1*64kB 0*128kB 0*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 2708kB
DMA32: empty
Normal: empty
HighMem: empty
Swap cache: add 4212, delete 4212, find 2106/2640, race 0+0
Free swap  = 1020108kB
Total swap = 1020108kB
Free swap:       1020108kB
524288 pages of RAM
0 pages of HIGHMEM
271276 reserved pages
147134 pages shared
0 pages swap cached
printk: 6 messages suppressed.
apache_volume: page allocation failure. order:1, mode:0x21
Call Trace:
 [<ffffffff80076efc>] __alloc_pages+0x21c/0x368
 [<ffffffff80098d74>] cache_alloc_refill+0x3c4/0x7f8
 [<ffffffff800992c4>] __kmalloc+0x11c/0x128
 [<ffffffff802b3efc>] __alloc_skb+0x8c/0x180
 [<ffffffff80237fe8>] meth_interrupt+0x610/0x8c0
 [<ffffffff80237f08>] meth_interrupt+0x530/0x8c0
 [<ffffffff8006c010>] handle_IRQ_event+0x78/0xf0
 [<ffffffff8006c1a0>] __do_IRQ+0x118/0x1c0
 [<ffffffff8000d16c>] timer_interrupt+0x1f4/0x480
 [<ffffffff8000957c>] do_IRQ+0x1c/0x38
 [<ffffffff8000797c>] ret_from_irq+0x0/0x10
 [<ffffffff801f43d0>] fbcon_cursor+0x0/0x400
 [<ffffffff80038800>] panic+0x250/0x2c0
 [<ffffffff80038828>] panic+0x278/0x2c0
 [<ffffffff8003df50>] do_exit+0x928/0xb48
 [<ffffffff8000e8c4>] die+0xec/0xf0
 [<ffffffff8000e8bc>] die+0xe4/0xf0
 [<ffffffff8000f098>] do_tr+0x0/0x120
 [<ffffffff800086b8>] handle_bp_int+0x20/0x28
 [<ffffffff800bef80>] d_callback+0x28/0x58
 [<ffffffff8009808c>] kfree+0x12c/0x138
 [<ffffffff800bef80>] d_callback+0x28/0x58
 [<ffffffff800524a0>] __rcu_process_callbacks+0xa8/0x3b0
 [<ffffffff800527e8>] rcu_process_callbacks+0x40/0x80
 [<ffffffff80041590>] tasklet_action+0xe8/0x1a8
 [<ffffffff80041590>] tasklet_action+0xe8/0x1a8
 [<ffffffff80040e6c>] __do_softirq+0xb4/0x188
 [<ffffffff80040fe0>] do_softirq+0xa0/0xa8
 [<ffffffff8000797c>] ret_from_irq+0x0/0x10

I power cycled the machine and cross compiled a new kernel based on the 
linux-2.6-2.6.18.dfsg.1 sources. Kernel booted fine and this morning I get
signal 11 errors from userland:

$ uptime
Segmentation fault
$ strace uptime
[...]
open("/proc/uptime", O_RDONLY)          = 3
lseek(3, 0, SEEK_SET)                   = 0
read(3, "186905.12 136515.89\n", 1023)  = 20
--- SIGSEGV (Segmentation fault) @ 0 (0) ---
+++ killed by SIGSEGV +++
Process 3453 detached

Are you sure this is not a memory/hardware problem? If it is not a hardware
problem, what would you suggest to do? Shall I provide more information?
Is this a known problem? If yes, is there a fix somewhere?

Thanks for your help,
Mic

-- 
http://daemon.nethack.at



Reply to: