Re: SGI O2 - Oops

To: Michael Dosser <mic@nethack.at>
Cc: debian-mips@lists.debian.org
Subject: Re: SGI O2 - Oops
From: "J. Scott Kasten" <jscottkasten@yahoo.com>
Date: Mon, 12 Mar 2007 09:29:55 -0400 (EDT)
Message-id: <[🔎] Pine.LNX.4.64.0703120927320.3158@bluefang.tetracon-eng.net>
In-reply-to: <[🔎] 20070312093924.GA2298@nethack.at>
References: <[🔎] 20070309134803.GG14244@nethack.at> <[🔎] 20070309195003.GA23975@networkno.de> <[🔎] 20070312093924.GA2298@nethack.at>

When you examine System.map, the 8006fb3c value will likely lay betweentwo entries that are in the table. You may have to sort the table to dothis lookup.


The two entries that bracket the address are what is important.

-S-

On Mon, 12 Mar 2007, Michael Dosser wrote:

Hi,

* On 2007-03-09 20:50 <ths@networkno.de> wrote:

A function at 0xffffffff8006fb3c in the kernel passed a bad pointer
(0x000000002abd8498) in the call to find_get_page. This looks like it is
a kernel bug. Your System.map file can tell you what function that was,
this helps probably a bit further.


Thanks for this clarification. I have neither addresses in
/boot/System.map-2.6.18:

$ grep 8006fb3c /boot/System.map-2.6.18
$ grep 2abd8498 /boot/System.map-2.6.18
$

Or am I searching at the wrong place?

On Friday the machine looped with another Oops (I could only see this on
the serial console) and did not respond to any network/console logins:

Mem-info:
DMA per-cpu:
cpu 0 hot: high 186, batch 31 used:20
cpu 0 cold: high 62, batch 15 used:56
DMA32 per-cpu: empty
Normal per-cpu: empty
HighMem per-cpu: empty
Free pages:        2708kB (0kB HighMem)
Active:98243 inactive:118124 dirty:46 writeback:0 unstable:0 free:677 slab:31417 mapped:7733 pagetables:1319
DMA free:2708kB min:5792kB low:7240kB high:8688kB active:392972kB inactive:472496kB present:2097152kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 0 0
DMA32 free:0kB min:0kB low:0kB high:0kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 0 0
Normal free:0kB min:0kB low:0kB high:0kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 0 0
HighMem free:0kB min:128kB low:128kB high:128kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 0 0
DMA: 405*4kB 0*8kB 0*16kB 0*32kB 1*64kB 0*128kB 0*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 2708kB
DMA32: empty
Normal: empty
HighMem: empty
Swap cache: add 4212, delete 4212, find 2106/2640, race 0+0
Free swap  = 1020108kB
Total swap = 1020108kB
Free swap:       1020108kB
524288 pages of RAM
0 pages of HIGHMEM
271276 reserved pages
147134 pages shared
0 pages swap cached
printk: 6 messages suppressed.
apache_volume: page allocation failure. order:1, mode:0x21
Call Trace:
[<ffffffff80076efc>] __alloc_pages+0x21c/0x368
[<ffffffff80098d74>] cache_alloc_refill+0x3c4/0x7f8
[<ffffffff800992c4>] __kmalloc+0x11c/0x128
[<ffffffff802b3efc>] __alloc_skb+0x8c/0x180
[<ffffffff80237fe8>] meth_interrupt+0x610/0x8c0
[<ffffffff80237f08>] meth_interrupt+0x530/0x8c0
[<ffffffff8006c010>] handle_IRQ_event+0x78/0xf0
[<ffffffff8006c1a0>] __do_IRQ+0x118/0x1c0
[<ffffffff8000d16c>] timer_interrupt+0x1f4/0x480
[<ffffffff8000957c>] do_IRQ+0x1c/0x38
[<ffffffff8000797c>] ret_from_irq+0x0/0x10
[<ffffffff801f43d0>] fbcon_cursor+0x0/0x400
[<ffffffff80038800>] panic+0x250/0x2c0
[<ffffffff80038828>] panic+0x278/0x2c0
[<ffffffff8003df50>] do_exit+0x928/0xb48
[<ffffffff8000e8c4>] die+0xec/0xf0
[<ffffffff8000e8bc>] die+0xe4/0xf0
[<ffffffff8000f098>] do_tr+0x0/0x120
[<ffffffff800086b8>] handle_bp_int+0x20/0x28
[<ffffffff800bef80>] d_callback+0x28/0x58
[<ffffffff8009808c>] kfree+0x12c/0x138
[<ffffffff800bef80>] d_callback+0x28/0x58
[<ffffffff800524a0>] __rcu_process_callbacks+0xa8/0x3b0
[<ffffffff800527e8>] rcu_process_callbacks+0x40/0x80
[<ffffffff80041590>] tasklet_action+0xe8/0x1a8
[<ffffffff80041590>] tasklet_action+0xe8/0x1a8
[<ffffffff80040e6c>] __do_softirq+0xb4/0x188
[<ffffffff80040fe0>] do_softirq+0xa0/0xa8
[<ffffffff8000797c>] ret_from_irq+0x0/0x10

I power cycled the machine and cross compiled a new kernel based on the
linux-2.6-2.6.18.dfsg.1 sources. Kernel booted fine and this morning I get
signal 11 errors from userland:

$ uptime
Segmentation fault
$ strace uptime
[...]
open("/proc/uptime", O_RDONLY)          = 3
lseek(3, 0, SEEK_SET)                   = 0
read(3, "186905.12 136515.89\n", 1023)  = 20
--- SIGSEGV (Segmentation fault) @ 0 (0) ---
+++ killed by SIGSEGV +++
Process 3453 detached

Are you sure this is not a memory/hardware problem? If it is not a hardware
problem, what would you suggest to do? Shall I provide more information?
Is this a known problem? If yes, is there a fix somewhere?

Thanks for your help,
Mic

--
http://daemon.nethack.at


--
To UNSUBSCRIBE, email to debian-mips-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org

Reply to:

Follow-Ups:
- Re: SGI O2 - Oops
  - From: Michael Dosser <mic@nethack.at>

References:
- SGI O2 - Oops
  - From: Michael Dosser <mic@nethack.at>
- Re: SGI O2 - Oops
  - From: Thiemo Seufer <ths@networkno.de>
- Re: SGI O2 - Oops
  - From: Michael Dosser <mic@nethack.at>

Prev by Date: Re: SGI O2 - Oops
Next by Date: Re: SGI O2 - Oops
Previous by thread: Re: SGI O2 - Oops
Next by thread: Re: SGI O2 - Oops
Index(es):
- Date
- Thread