[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#576838: KVM: networking stack tanks after page allocation failure



On Fri, 2010-04-09 at 23:38 -0400, micah anderson wrote:
> On Sat, 10 Apr 2010 01:48:24 +0100, Ben Hutchings <ben@decadent.org.uk> wrote:
> > On Thu, 2010-04-08 at 12:41 -0400, micah anderson wrote:
> > > On 2010-04-08, micah anderson wrote:
> > > > On Wed, 2010-04-07 at 11:52 -0400, Micah Anderson wrote:
> > > > > Package: linux-image-2.6.32-2-amd64
> > > > > Version: 2.6.32-8~bpo50+1
> > > > > Severity: important
> > > > > 
> > > > > I'm running a tor exit node on a kvm instance, it runs for a little
> > > > > while (between an hour and 3 days), doing 30-40mbit/sec and then
> > > > > suddenly 'swapper: page allocation failure' happens, and the entire
> > > > > networking stack of the kvm instance is dead. It stops responding on
> > > > > the net completely. No ping in or out, no traffic can be observed
> > > > > using tcpdump, the counters on the interface no longer change
> > > > > (although the interface stays up).
> > > > [...]
> > > > 
> > > > It sounds like there might be a memory leak.  Please send the contents
> > > > of /proc/meminfo and /proc/slabinfo from a 'normal' state and the broken
> > > > state.
> > > 
> > > I noticed this time when it crashed something different that I had not
> > > seen in previous 2.6.30/2.6.26 kernels:
> > > 
> > > [ 7962.841287] SLUB: Unable to allocate memory on node -1 (gfp=0x20)
> > > [ 7962.841287]   cache: kmalloc-1024, object size: 1024, buffer size: 1024, default order: 1, min order: 0
> > > [ 7962.841287]   node 0: slabs: 606, objs: 4544, free: 0
> > > 
> > > and then the normal:
> > > [ 7963.102476] swapper: page allocation failure. order:0, mode:0x4020
> > > [ 7963.105743] Pid: 0, comm: swapper Not tainted 2.6.32-bpo.2-amd64 #1
> > > [ 7963.106418] Call Trace:
> > > [ 7963.106418]  <IRQ>  [<ffffffff810b947d>] ? __alloc_pages_nodemask+0x55b/0x5ce
> > > etc. 
> > > 
> > > As requested here is a normal state /proc/meminfo and /proc/slabinfo. See below for
> > > the broken state
> > [...]
> > 
> > There's no sign of a memory leak and there's actually much more free
> > memory in the broken state, perhaps because any network servers have
> > lost all their clients and freed session state.  My guess is that the
> > driver just doesn't handle allocation failure gracefully.  Which network
> > driver are you using in the guest?
> 
> I started with virtio, but had a hunch that maybe switching to e100e
> might be more stable, but sadly both produce the same results.
[...]

There's no such thing as e100e - Linux has e100, e1000 and e1000e
drivers; QEMU only emulates e1000.  Please run lsmod inside the guest to
check what's really being used.

Ben.

-- 
Ben Hutchings
Once a job is fouled up, anything done to improve it makes it worse.

Attachment: signature.asc
Description: This is a digitally signed message part


Reply to: