[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: SGI O2 - Oops




Could we get a look at the head of your kernel log file, where the CPU is detected, devices probed, etc., just up to the point where the system is fully up?


On Mon, 12 Mar 2007, Michael Dosser wrote:

Hi,

* On 2007-03-12 13:12 <ths@networkno.de> wrote:

Now this looks more like broken RAM. You can try to re-seat the RAM
modules, if that doesn't help, try to find and remove the faulty
memory module.

Ok, what I have done now is to start the machine with every combination
of RAM (AFAIK there have to be at least 2 modules of RAM in it) with
exactly two RAM modules (128 + 128 in my case). Booted in single user
mode and started to run memtest (from the memtester package). Every
combination gives something similar like this (stripped down for
readability):

[...]
Allocated 1420820480 bytes...trying mlock...oom-killer: gfp_mask=0x200d2, order=0
Call Trace:
[<ffffffff80075050>] out_of_memory+0x1f8/0x248
[<ffffffff80077014>] __alloc_pages+0x334/0x368
[<ffffffff800b99a8>] poll_freewait+0x38/0xb0
[<ffffffff8008eb00>] read_swap_cache_async+0x138/0x1d0
[<ffffffff800815e0>] swapin_readahead+0x90/0xc8
[<ffffffff80083a24>] __handle_mm_fault+0x254/0xe30
[<ffffffff800b23f8>] do_lookup+0x80/0x1c8
[<ffffffff8001fcd0>] do_page_fault+0x2a0/0x450
[<ffffffff800b59b4>] link_path_walk+0x154/0x2b0
[<ffffffff800b5988>] link_path_walk+0x128/0x2b0
[<ffffffff80020718>] tlb_do_page_fault_1+0x110/0x118
[<ffffffff80023cc0>] r5k_dma_cache_inv_sc+0x0/0xd0
[<ffffffff800db9d4>] compat_core_sys_select+0x1e4/0x210
[<ffffffff800daf3c>] compat_set_fd_set+0x5c/0x68
[<ffffffff800ddeb8>] compat_sys_select+0x100/0x208
[<ffffffff8001dd68>] handle_sys+0x128/0x144
[<ffffffff8001dd68>] handle_sys+0x128/0x144
[<ffffffff8001fcf3>] do_page_fault+0x2c2/0x450

Mem-info:
DMA per-cpu:
cpu 0 hot: high 186, batch 31 used:35
cpu 0 cold: high 62, batch 15 used:56
DMA32 per-cpu: empty
Normal per-cpu: empty
HighMem per-cpu: empty
Free pages:        5408kB (0kB HighMem)
Active:171807 inactive:75219 dirty:0 writeback:0 unstable:0 free:1352 slab:1746
mapped:1 pagetables:505
DMA free:5408kB min:5792kB low:7240kB high:8688kB active:687228kB inactive:300876kB present:2097152kB pages_scanned:1164172 all_unreclaimable? yes
lowmem_reserve[]: 0 0 0 0
DMA32 free:0kB min:0kB low:0kB high:0kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 0 0
Normal free:0kB min:0kB low:0kB high:0kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 0 0
HighMem free:0kB min:128kB low:128kB high:128kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 0 0
DMA: 0*4kB 22*8kB 5*16kB 1*32kB 0*64kB 0*128kB 0*256kB 0*512kB 1*1024kB 0*2048kB 1*4096kB = 5408kB
DMA32: empty
Normal: empty
HighMem: empty
Swap cache: add 247208, delete 243, find 0/1, race 0+0
Free swap  = 31276kB
Total swap = 1020108kB
Free swap:        31276kB
524288 pages of RAM
0 pages of HIGHMEM
271276 reserved pages
61 pages shared
246965 pages swap cached
Out of Memory: Kill process 1549 (memtest) score 21726 and children.
Out of memory: Killed process 1549 (memtest).
oom-killer: gfp_mask=0x200d2, order=0
Call Trace:
[<ffffffff80075050>] out_of_memory+0x1f8/0x248
[<ffffffff80077014>] __alloc_pages+0x334/0x368
[<ffffffff80022210>] r4k_blast_dcache_page_dc32+0x0/0xa0
[<ffffffff80084030>] __handle_mm_fault+0x860/0xe30
[<ffffffff800846f0>] get_user_pages+0xf0/0x490
[<ffffffff80085c24>] make_pages_present+0x8c/0xe0
[<ffffffff8008644c>] mlock_fixup+0x11c/0x178
[<ffffffff800863e0>] mlock_fixup+0xb0/0x178
[<ffffffff8008665c>] do_mlock+0x10c/0x158
[<ffffffff80086900>] sys_mlock+0xa0/0xf0
[<ffffffff80086884>] sys_mlock+0x24/0xf0
[<ffffffff8001dd68>] handle_sys+0x128/0x144
[<ffffffff8009e168>] sys_write+0x0/0x90
[<ffffffff8001fcf3>] do_page_fault+0x2c2/0x450

[...]

Mem-info:
DMA per-cpu:
cpu 0 hot: high 186, batch 31 used:0
cpu 0 cold: high 62, batch 15 used:56
DMA32 per-cpu: empty
Normal per-cpu: empty
HighMem per-cpu: empty
Free pages:           0kB (0kB HighMem)
Active:204640 inactive:43774 dirty:0 writeback:0 unstable:0 free:0 slab:1742 mapped:1 pagetables:508
DMA free:0kB min:5792kB low:7240kB high:8688kB active:818560kB inactive:175096kB present:2097152kB pages_scanned:1260869 all_unreclaimable? yes
lowmem_reserve[]: 0 0 0 0
DMA32 free:0kB min:0kB low:0kB high:0kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 0 0
Normal free:0kB min:0kB low:0kB high:0kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 0 0
HighMem free:0kB min:128kB low:128kB high:128kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 0 0
DMA: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB
0*4096kB = 0kB
DMA32: empty
Normal: empty
HighMem: empty
Swap cache: add 247208, delete 243, find 0/1, race 0+0
Free swap  = 31276kB
Total swap = 1020108kB
Free swap:        31276kB
524288 pages of RAM
0 pages of HIGHMEM
271276 reserved pages
61 pages shared
246965 pages swap cached
Killed

I can't imagine that all 8 memory modules are faulty. The machine runs on
APC UPS ... I have two spare 64m modules I could test, but they are at
home, so I can test them tomorrow.

If it's not memory, what else could it be? The combination on this O2 might
be somewhat exotic (r5k cpu, 1G ram, software raid 1, swap on both disks
sda2 and sdb2) ...

Mic

--
Friends don't let friends dual-boot
                                --Brett @ FreeBSD-chat


--
To UNSUBSCRIBE, email to debian-mips-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org





Reply to: