[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#451297: marked as done (linux-image-2.6.18-5-xen-686: kernel page allocation failure causes networking freeze)



Your message dated Mon, 15 Feb 2010 20:18:14 +0100
with message-id <20100215191814.GN9624@baikonur.stro.at>
and subject line Re: Xen || vserver troubles
has caused the Debian Bug report #451297,
regarding linux-image-2.6.18-5-xen-686: kernel page allocation failure causes networking freeze
to be marked as done.

This means that you claim that the problem has been dealt with.
If this is not the case it is now your responsibility to reopen the
Bug report if necessary, and/or fix the problem forthwith.

(NB: If you are a system administrator and have no idea what this
message is talking about, this may indicate a serious mail system
misconfiguration somewhere. Please contact owner@bugs.debian.org
immediately.)


-- 
451297: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=451297
Debian Bug Tracking System
Contact owner@bugs.debian.org with problems
--- Begin Message ---
Package: linux-image-2.6.18-5-xen-686
Version: 2.6.18.dfsg.1-13etch4
Severity: grave
Justification: renders package unusable

Hi,

In the past couple of months two of my Xen dom0 servers have, after about
a week of uptime, been reporting kernel errors like so:

Nov 14 18:57:58 corona kernel: swapper: page allocation failure. order:0, mode:0x20
Nov 14 18:57:58 corona kernel:  [<c0140735>] __alloc_pages+0x261/0x275
Nov 14 18:57:58 corona kernel:  [<c01561c2>] cache_alloc_refill+0x297/0x493
Nov 14 18:57:58 corona kernel:  [<c0104a51>] hypervisor_callback+0x3d/0x48
Nov 14 18:57:58 corona kernel:  [<c020007b>] handle_diacr+0x58/0xad
Nov 14 18:57:58 corona kernel:  [<c0155f12>] kmem_cache_alloc+0x3b/0x54
Nov 14 18:57:58 corona kernel:  [<c022e995>] alloc_skb_from_cache+0x48/0x110
Nov 14 18:57:58 corona kernel:  [<c020d708>] __alloc_skb+0x6c/0x70
Nov 14 18:57:58 corona kernel:  [<c0215d5b>] netif_be_start_xmit+0x118/0x3d5
Nov 14 18:57:58 corona kernel:  [<c023269e>] dev_hard_start_xmit+0x19a/0x1f0
Nov 14 18:57:58 corona kernel:  [<c0234020>] dev_queue_xmit+0x247/0x2e3
Nov 14 18:57:58 corona kernel:  [<ee406dfe>] br_dev_queue_push_xmit+0x155/0x178 [bridge]
Nov 14 18:57:58 corona kernel:  [<ee406e64>] br_forward_finish+0x43/0x45 [bridge]
Nov 14 18:57:58 corona kernel:  [<ee40aae4>] br_nf_forward_finish+0xc6/0xcc [bridge]
Nov 14 18:57:58 corona kernel:  [<ee40b34a>] br_nf_forward_arp+0x116/0x128 [bridge]
Nov 14 18:57:58 corona kernel:  [<c0246e28>] nf_iterate+0x30/0x61
Nov 14 18:57:58 corona kernel:  [<ee406e21>] br_forward_finish+0x0/0x45 [bridge]
Nov 14 18:57:58 corona kernel:  [<c0246f4e>] nf_hook_slow+0x3a/0x90
Nov 14 18:57:58 corona kernel:  [<ee406e21>] br_forward_finish+0x0/0x45 [bridge]
Nov 14 18:57:58 corona kernel:  [<ee406eac>] __br_forward+0x46/0x57 [bridge]
Nov 14 18:57:58 corona kernel:  [<ee406e21>] br_forward_finish+0x0/0x45 [bridge]
Nov 14 18:57:58 corona kernel:  [<ee406c59>] br_flood+0x65/0x9d [bridge]
Nov 14 18:57:58 corona kernel:  [<ee406e66>] __br_forward+0x0/0x57 [bridge]
Nov 14 18:57:58 corona kernel:  [<ee406c9b>] br_flood_forward+0xa/0xc [bridge]
Nov 14 18:57:58 corona kernel:  [<ee406e66>] __br_forward+0x0/0x57 [bridge]
Nov 14 18:57:58 corona kernel:  [<ee407868>] br_handle_frame_finish+0x80/0xcf [bridge]
Nov 14 18:57:58 corona kernel:  [<ee407a16>] br_handle_frame+0x15f/0x179 [bridge]
Nov 14 18:57:58 corona kernel:  [<c0232231>] netif_receive_skb+0x25e/0x357
Nov 14 18:57:58 corona kernel:  [<ee084130>] e1000_clean_rx_irq_ps+0x4a6/0x569 [e1000]
Nov 14 18:57:58 corona kernel:  [<ee082c4c>] e1000_clean+0x69/0x136 [e1000]
Nov 14 18:57:58 corona kernel:  [<c0233ce0>] net_rx_action+0x96/0x18f
Nov 14 18:57:58 corona kernel:  [<c011f41e>] __do_softirq+0x5e/0xc3
Nov 14 18:57:58 corona kernel:  [<c011f4bd>] do_softirq+0x3a/0x4a
Nov 14 18:57:58 corona kernel:  [<c0106131>] do_IRQ+0x48/0x53
Nov 14 18:57:58 corona kernel:  [<c020c1cc>] evtchn_do_upcall+0x64/0x9b
Nov 14 18:57:58 corona kernel:  [<c0104a51>] hypervisor_callback+0x3d/0x48
Nov 14 18:57:58 corona kernel:  [<c0107342>] raw_safe_halt+0x8c/0xaf
Nov 14 18:57:58 corona kernel:  [<c0102c5f>] xen_idle+0x22/0x2e
Nov 14 18:57:58 corona kernel:  [<c0102d7e>] cpu_idle+0x91/0xab
Nov 14 18:57:58 corona kernel:  [<c03236fc>] start_kernel+0x378/0x37f
Nov 14 18:57:58 corona kernel: Mem-info:
Nov 14 18:57:58 corona kernel: DMA per-cpu:
Nov 14 18:57:58 corona kernel: cpu 0 hot: high 186, batch 31 used:30
Nov 14 18:57:58 corona kernel: cpu 0 cold: high 62, batch 15 used:55
Nov 14 18:57:58 corona kernel: DMA32 per-cpu: empty
Nov 14 18:57:58 corona kernel: Normal per-cpu: empty
Nov 14 18:57:58 corona kernel: HighMem per-cpu:
Nov 14 18:57:58 corona kernel: cpu 0 hot: high 90, batch 15 used:75
Nov 14 18:57:58 corona kernel: cpu 0 cold: high 30, batch 7 used:6
Nov 14 18:57:58 corona kernel: Free pages:       34404kB (33228kB HighMem)
Nov 14 18:57:58 corona kernel: Active:146620 inactive:39375 dirty:10 writeback:0 unstable:0 free:8601 slab:19722 mapped:2949 pagetables:254
Nov 14 18:57:58 corona kernel: DMA free:1176kB min:3452kB low:4312kB high:5176kB active:454452kB inactive:122052kB present:745464kB
pages_scanned:0 all_unreclaimable? no
Nov 14 18:57:58 corona kernel: lowmem_reserve[]: 0 0 0 204
Nov 14 18:57:58 corona kernel: DMA32 free:0kB min:0kB low:0kB high:0kB active:0kB inactive:0kB present:0kB pages_scanned:0
all_unreclaimable? no
Nov 14 18:57:58 corona kernel: lowmem_reserve[]: 0 0 0 204
Nov 14 18:57:58 corona kernel: Normal free:0kB min:0kB low:0kB high:0kB active:0kB inactive:0kB present:0kB pages_scanned:0
all_unreclaimable? no
Nov 14 18:57:58 corona kernel: lowmem_reserve[]: 0 0 0 1632
Nov 14 18:57:58 corona kernel: HighMem free:33228kB min:204kB low:444kB high:684kB active:132028kB inactive:35448kB present:208904kB
pages_scanned:0 all_unreclaimable? no
Nov 14 18:57:58 corona kernel: lowmem_reserve[]: 0 0 0 0
Nov 14 18:57:58 corona kernel: DMA: 0*4kB 1*8kB 1*16kB 0*32kB 0*64kB 1*128kB 0*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 1176kB
Nov 14 18:57:58 corona kernel: DMA32: empty
Nov 14 18:57:58 corona kernel: Normal: empty
Nov 14 18:57:58 corona kernel: HighMem: 1353*4kB 1691*8kB 439*16kB 129*32kB 17*64kB 8*128kB 4*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB =
33228kB
Nov 14 18:57:58 corona kernel: Swap cache: add 23, delete 0, find 0/0, race 0+0
Nov 14 18:57:58 corona kernel: Free swap  = 1975580kB
Nov 14 18:57:58 corona kernel: Total swap = 1975672kB
Nov 14 18:57:58 corona kernel: Free swap:       1975580kB
Nov 14 18:57:58 corona kernel: 238592 pages of RAM
Nov 14 18:57:58 corona kernel: 52226 pages of HIGHMEM
Nov 14 18:57:58 corona kernel: 19812 reserved pages
Nov 14 18:57:58 corona kernel: 146572 pages shared
Nov 14 18:57:58 corona kernel: 23 pages swap cached
Nov 14 18:57:58 corona kernel: 10 pages dirty
Nov 14 18:57:58 corona kernel: 0 pages writeback
Nov 14 18:57:58 corona kernel: 2949 pages mapped
Nov 14 18:57:58 corona kernel: 19722 pages slab
Nov 14 18:57:58 corona kernel: 254 pages pagetables

This will scroll by for a few minutes during which time networking is
completely frozen.  The server is usable over serial console but no
networking takes place at all.  Finally after a few minutes the server
comes back to life, network-wise.

This will reoccur every couple of hours forcing an eventual reboot.

I don't know where to start debugging this, but it only has started
happening with linux-image-2.6.18-5-xen-686.  I will try downgrading
back to linux-image-2.6.18-4-xen-686 just to see if the problem goes
away.

The dom0 is using the stock debian Xen packages, and dom0_mem kernel
command line was used to give dom0 1G RAM.  When the above is occuring,
top does not suggest that the server is running out of RAM or swap.  The
usual bridged networking setup is in place.

If you need any more information I will be happy to provide.  This was
also reported in the Xen bugzilla when it last happened:

http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1097

but as I've had no response to that at all I figured I'd try Debian this
time :)

Cheers,
Andy

-- System Information:
Debian Release: 4.0
  APT prefers stable
  APT policy: (500, 'stable')
Architecture: i386 (i686)
Shell:  /bin/sh linked to /bin/bash
Kernel: Linux 2.6.18-5-xen-686
Locale: LANG=en_GB.UTF-8, LC_CTYPE=en_GB.UTF-8 (charmap=UTF-8)

Versions of packages linux-image-2.6.18-5-xen-686 depends on:
ii  initramfs-tools    0.85h                 tools for generating an initramfs
ii  linux-modules-2.6. 2.6.18.dfsg.1-13etch4 Linux 2.6.18 modules on i686

Versions of packages linux-image-2.6.18-5-xen-686 recommends:
ii  libc6-xen              2.3.6.ds1-13etch2 GNU C Library: Shared libraries [X

-- no debconf information



--- End Message ---
--- Begin Message ---
the 2.6.18 linux images from Etch are no longer supported, thus closing
this bug report.  As both Xen or vserver stayed out of tree it is very
unlikely that they improved a lot since.

With modern hardware kvm or lxc (linux containers) are recommended.
if you still haven't upgraded to Lenny please notice that Etch has
no security support any more as of today:
http://www.debian.org/News/2010/20100121


if you can reproduce said bugs with 2.6.32 linux images from
unstable please shout on said box and bug can be reopened:
reportbug -N <bugnr>

thank you for your report.



--- End Message ---

Reply to: