[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#451297: linux-image-2.6.18-5-xen-686: kernel page allocation failure causes networking freeze



Package: linux-image-2.6.18-5-xen-686
Version: 2.6.18.dfsg.1-13etch4
Severity: grave
Justification: renders package unusable

Hi,

In the past couple of months two of my Xen dom0 servers have, after about
a week of uptime, been reporting kernel errors like so:

Nov 14 18:57:58 corona kernel: swapper: page allocation failure. order:0, mode:0x20
Nov 14 18:57:58 corona kernel:  [<c0140735>] __alloc_pages+0x261/0x275
Nov 14 18:57:58 corona kernel:  [<c01561c2>] cache_alloc_refill+0x297/0x493
Nov 14 18:57:58 corona kernel:  [<c0104a51>] hypervisor_callback+0x3d/0x48
Nov 14 18:57:58 corona kernel:  [<c020007b>] handle_diacr+0x58/0xad
Nov 14 18:57:58 corona kernel:  [<c0155f12>] kmem_cache_alloc+0x3b/0x54
Nov 14 18:57:58 corona kernel:  [<c022e995>] alloc_skb_from_cache+0x48/0x110
Nov 14 18:57:58 corona kernel:  [<c020d708>] __alloc_skb+0x6c/0x70
Nov 14 18:57:58 corona kernel:  [<c0215d5b>] netif_be_start_xmit+0x118/0x3d5
Nov 14 18:57:58 corona kernel:  [<c023269e>] dev_hard_start_xmit+0x19a/0x1f0
Nov 14 18:57:58 corona kernel:  [<c0234020>] dev_queue_xmit+0x247/0x2e3
Nov 14 18:57:58 corona kernel:  [<ee406dfe>] br_dev_queue_push_xmit+0x155/0x178 [bridge]
Nov 14 18:57:58 corona kernel:  [<ee406e64>] br_forward_finish+0x43/0x45 [bridge]
Nov 14 18:57:58 corona kernel:  [<ee40aae4>] br_nf_forward_finish+0xc6/0xcc [bridge]
Nov 14 18:57:58 corona kernel:  [<ee40b34a>] br_nf_forward_arp+0x116/0x128 [bridge]
Nov 14 18:57:58 corona kernel:  [<c0246e28>] nf_iterate+0x30/0x61
Nov 14 18:57:58 corona kernel:  [<ee406e21>] br_forward_finish+0x0/0x45 [bridge]
Nov 14 18:57:58 corona kernel:  [<c0246f4e>] nf_hook_slow+0x3a/0x90
Nov 14 18:57:58 corona kernel:  [<ee406e21>] br_forward_finish+0x0/0x45 [bridge]
Nov 14 18:57:58 corona kernel:  [<ee406eac>] __br_forward+0x46/0x57 [bridge]
Nov 14 18:57:58 corona kernel:  [<ee406e21>] br_forward_finish+0x0/0x45 [bridge]
Nov 14 18:57:58 corona kernel:  [<ee406c59>] br_flood+0x65/0x9d [bridge]
Nov 14 18:57:58 corona kernel:  [<ee406e66>] __br_forward+0x0/0x57 [bridge]
Nov 14 18:57:58 corona kernel:  [<ee406c9b>] br_flood_forward+0xa/0xc [bridge]
Nov 14 18:57:58 corona kernel:  [<ee406e66>] __br_forward+0x0/0x57 [bridge]
Nov 14 18:57:58 corona kernel:  [<ee407868>] br_handle_frame_finish+0x80/0xcf [bridge]
Nov 14 18:57:58 corona kernel:  [<ee407a16>] br_handle_frame+0x15f/0x179 [bridge]
Nov 14 18:57:58 corona kernel:  [<c0232231>] netif_receive_skb+0x25e/0x357
Nov 14 18:57:58 corona kernel:  [<ee084130>] e1000_clean_rx_irq_ps+0x4a6/0x569 [e1000]
Nov 14 18:57:58 corona kernel:  [<ee082c4c>] e1000_clean+0x69/0x136 [e1000]
Nov 14 18:57:58 corona kernel:  [<c0233ce0>] net_rx_action+0x96/0x18f
Nov 14 18:57:58 corona kernel:  [<c011f41e>] __do_softirq+0x5e/0xc3
Nov 14 18:57:58 corona kernel:  [<c011f4bd>] do_softirq+0x3a/0x4a
Nov 14 18:57:58 corona kernel:  [<c0106131>] do_IRQ+0x48/0x53
Nov 14 18:57:58 corona kernel:  [<c020c1cc>] evtchn_do_upcall+0x64/0x9b
Nov 14 18:57:58 corona kernel:  [<c0104a51>] hypervisor_callback+0x3d/0x48
Nov 14 18:57:58 corona kernel:  [<c0107342>] raw_safe_halt+0x8c/0xaf
Nov 14 18:57:58 corona kernel:  [<c0102c5f>] xen_idle+0x22/0x2e
Nov 14 18:57:58 corona kernel:  [<c0102d7e>] cpu_idle+0x91/0xab
Nov 14 18:57:58 corona kernel:  [<c03236fc>] start_kernel+0x378/0x37f
Nov 14 18:57:58 corona kernel: Mem-info:
Nov 14 18:57:58 corona kernel: DMA per-cpu:
Nov 14 18:57:58 corona kernel: cpu 0 hot: high 186, batch 31 used:30
Nov 14 18:57:58 corona kernel: cpu 0 cold: high 62, batch 15 used:55
Nov 14 18:57:58 corona kernel: DMA32 per-cpu: empty
Nov 14 18:57:58 corona kernel: Normal per-cpu: empty
Nov 14 18:57:58 corona kernel: HighMem per-cpu:
Nov 14 18:57:58 corona kernel: cpu 0 hot: high 90, batch 15 used:75
Nov 14 18:57:58 corona kernel: cpu 0 cold: high 30, batch 7 used:6
Nov 14 18:57:58 corona kernel: Free pages:       34404kB (33228kB HighMem)
Nov 14 18:57:58 corona kernel: Active:146620 inactive:39375 dirty:10 writeback:0 unstable:0 free:8601 slab:19722 mapped:2949 pagetables:254
Nov 14 18:57:58 corona kernel: DMA free:1176kB min:3452kB low:4312kB high:5176kB active:454452kB inactive:122052kB present:745464kB
pages_scanned:0 all_unreclaimable? no
Nov 14 18:57:58 corona kernel: lowmem_reserve[]: 0 0 0 204
Nov 14 18:57:58 corona kernel: DMA32 free:0kB min:0kB low:0kB high:0kB active:0kB inactive:0kB present:0kB pages_scanned:0
all_unreclaimable? no
Nov 14 18:57:58 corona kernel: lowmem_reserve[]: 0 0 0 204
Nov 14 18:57:58 corona kernel: Normal free:0kB min:0kB low:0kB high:0kB active:0kB inactive:0kB present:0kB pages_scanned:0
all_unreclaimable? no
Nov 14 18:57:58 corona kernel: lowmem_reserve[]: 0 0 0 1632
Nov 14 18:57:58 corona kernel: HighMem free:33228kB min:204kB low:444kB high:684kB active:132028kB inactive:35448kB present:208904kB
pages_scanned:0 all_unreclaimable? no
Nov 14 18:57:58 corona kernel: lowmem_reserve[]: 0 0 0 0
Nov 14 18:57:58 corona kernel: DMA: 0*4kB 1*8kB 1*16kB 0*32kB 0*64kB 1*128kB 0*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 1176kB
Nov 14 18:57:58 corona kernel: DMA32: empty
Nov 14 18:57:58 corona kernel: Normal: empty
Nov 14 18:57:58 corona kernel: HighMem: 1353*4kB 1691*8kB 439*16kB 129*32kB 17*64kB 8*128kB 4*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB =
33228kB
Nov 14 18:57:58 corona kernel: Swap cache: add 23, delete 0, find 0/0, race 0+0
Nov 14 18:57:58 corona kernel: Free swap  = 1975580kB
Nov 14 18:57:58 corona kernel: Total swap = 1975672kB
Nov 14 18:57:58 corona kernel: Free swap:       1975580kB
Nov 14 18:57:58 corona kernel: 238592 pages of RAM
Nov 14 18:57:58 corona kernel: 52226 pages of HIGHMEM
Nov 14 18:57:58 corona kernel: 19812 reserved pages
Nov 14 18:57:58 corona kernel: 146572 pages shared
Nov 14 18:57:58 corona kernel: 23 pages swap cached
Nov 14 18:57:58 corona kernel: 10 pages dirty
Nov 14 18:57:58 corona kernel: 0 pages writeback
Nov 14 18:57:58 corona kernel: 2949 pages mapped
Nov 14 18:57:58 corona kernel: 19722 pages slab
Nov 14 18:57:58 corona kernel: 254 pages pagetables

This will scroll by for a few minutes during which time networking is
completely frozen.  The server is usable over serial console but no
networking takes place at all.  Finally after a few minutes the server
comes back to life, network-wise.

This will reoccur every couple of hours forcing an eventual reboot.

I don't know where to start debugging this, but it only has started
happening with linux-image-2.6.18-5-xen-686.  I will try downgrading
back to linux-image-2.6.18-4-xen-686 just to see if the problem goes
away.

The dom0 is using the stock debian Xen packages, and dom0_mem kernel
command line was used to give dom0 1G RAM.  When the above is occuring,
top does not suggest that the server is running out of RAM or swap.  The
usual bridged networking setup is in place.

If you need any more information I will be happy to provide.  This was
also reported in the Xen bugzilla when it last happened:

http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1097

but as I've had no response to that at all I figured I'd try Debian this
time :)

Cheers,
Andy

-- System Information:
Debian Release: 4.0
  APT prefers stable
  APT policy: (500, 'stable')
Architecture: i386 (i686)
Shell:  /bin/sh linked to /bin/bash
Kernel: Linux 2.6.18-5-xen-686
Locale: LANG=en_GB.UTF-8, LC_CTYPE=en_GB.UTF-8 (charmap=UTF-8)

Versions of packages linux-image-2.6.18-5-xen-686 depends on:
ii  initramfs-tools    0.85h                 tools for generating an initramfs
ii  linux-modules-2.6. 2.6.18.dfsg.1-13etch4 Linux 2.6.18 modules on i686

Versions of packages linux-image-2.6.18-5-xen-686 recommends:
ii  libc6-xen              2.3.6.ds1-13etch2 GNU C Library: Shared libraries [X

-- no debconf information




Reply to: