Bug#576838: KVM: networking stack tanks after page allocation failure

To: Ben Hutchings <ben@decadent.org.uk>
Cc: 576838@bugs.debian.org
Subject: Bug#576838: KVM: networking stack tanks after page allocation failure
From: micah anderson <micah@debian.org>
Date: Fri, 09 Apr 2010 23:38:48 -0400
Message-id: <87k4sgrn7b.fsf@algae.riseup.net>
Reply-to: micah anderson <micah@debian.org>, 576838@bugs.debian.org
In-reply-to: <1270860504.2176.62.camel@localhost>
References: <20100407155242.15891.96139.reportbug@algae.riseup.net> <1270695056.2178.76.camel@localhost> <874ojlvqui.fsf@algae.riseup.net> <1270860504.2176.62.camel@localhost>

On Sat, 10 Apr 2010 01:48:24 +0100, Ben Hutchings <ben@decadent.org.uk> wrote:
> On Thu, 2010-04-08 at 12:41 -0400, micah anderson wrote:
> > On 2010-04-08, micah anderson wrote:
> > > On Wed, 2010-04-07 at 11:52 -0400, Micah Anderson wrote:
> > > > Package: linux-image-2.6.32-2-amd64
> > > > Version: 2.6.32-8~bpo50+1
> > > > Severity: important
> > > > 
> > > > I'm running a tor exit node on a kvm instance, it runs for a little
> > > > while (between an hour and 3 days), doing 30-40mbit/sec and then
> > > > suddenly 'swapper: page allocation failure' happens, and the entire
> > > > networking stack of the kvm instance is dead. It stops responding on
> > > > the net completely. No ping in or out, no traffic can be observed
> > > > using tcpdump, the counters on the interface no longer change
> > > > (although the interface stays up).
> > > [...]
> > > 
> > > It sounds like there might be a memory leak.  Please send the contents
> > > of /proc/meminfo and /proc/slabinfo from a 'normal' state and the broken
> > > state.
> > 
> > I noticed this time when it crashed something different that I had not
> > seen in previous 2.6.30/2.6.26 kernels:
> > 
> > [ 7962.841287] SLUB: Unable to allocate memory on node -1 (gfp=0x20)
> > [ 7962.841287]   cache: kmalloc-1024, object size: 1024, buffer size: 1024, default order: 1, min order: 0
> > [ 7962.841287]   node 0: slabs: 606, objs: 4544, free: 0
> > 
> > and then the normal:
> > [ 7963.102476] swapper: page allocation failure. order:0, mode:0x4020
> > [ 7963.105743] Pid: 0, comm: swapper Not tainted 2.6.32-bpo.2-amd64 #1
> > [ 7963.106418] Call Trace:
> > [ 7963.106418]  <IRQ>  [<ffffffff810b947d>] ? __alloc_pages_nodemask+0x55b/0x5ce
> > etc. 
> > 
> > As requested here is a normal state /proc/meminfo and /proc/slabinfo. See below for
> > the broken state
> [...]
> 
> There's no sign of a memory leak and there's actually much more free
> memory in the broken state, perhaps because any network servers have
> lost all their clients and freed session state.  My guess is that the
> driver just doesn't handle allocation failure gracefully.  Which network
> driver are you using in the guest?

I started with virtio, but had a hunch that maybe switching to e100e
might be more stable, but sadly both produce the same results.

Here is the domain.xml:

<domain type='kvm'>
  <name>wagtail.example.net</name>
  <uuid>cfdd8232-be2f-4ac5-9cbd-dbc6f6956d77</uuid>
  <memory>524 288</memory>
  <currentMemory>524 288</currentMemory>
  <vcpu>1</vcpu>
  <os>
    <type arch='x86_64' machine='pc'>hvm</type>
    <boot dev='hd'/>
    <boot dev='cdrom'/>
  </os>
  <features>
    <acpi/>
  </features>
  <clock offset='utc'/>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>restart</on_crash>
  <devices>
    <emulator>/usr/bin/kvm</emulator>
    <disk type='file' device='cdrom'>
      <source file='/root/grub-rescue/grub-rescue.iso'/>
      <target dev='hdc' bus='ide'/>
      <readonly/>
    </disk>
    <disk type='block' device='disk'>
      <source dev='/dev/disk/by-id/dm-name-khyber-micah_wagtail.example.net'/>
      <target dev='vda' bus='virtio'/>
    </disk>
    <interface type='ethernet'>
      <mac address='52:54:00:43:ae:3d'/>
      <target dev='wagtail'/>
      <script path='/bin/true'/>
      <model type='e100e'/>
    </interface>
    <serial type='pty'>
      <target port='0'/>
    </serial>
    <serial type='unix'>
      <source mode='bind' path='/home/micah/wagtail.example.net/ttyS1'/>
      <target port='1'/>
    </serial>
    <console type='pty'>
      <target port='0'/>
    </console>
    <graphics type='vnc' autoport='true'/>
  </devices>
</domain>

Attachment: pgpofz_F6Xk1k.pgp
Description: PGP signature

Reply to:

Follow-Ups:
- Bug#576838: KVM: networking stack tanks after page allocation failure
  - From: Ben Hutchings <ben@decadent.org.uk>

References:
- Bug#576838: KVM: networking stack tanks after page allocation failure
  - From: Micah Anderson <micah@debian.org>
- Bug#576838: KVM: networking stack tanks after page allocation failure
  - From: Ben Hutchings <ben@decadent.org.uk>
- Bug#576838: KVM: networking stack tanks after page allocation failure
  - From: micah anderson <micah@debian.org>
- Bug#576838: KVM: networking stack tanks after page allocation failure
  - From: Ben Hutchings <ben@decadent.org.uk>

Prev by Date: Processed: tagging 563313
Next by Date: Bug#577149: configure_networking: pxelinux BOOTIF fixes
Previous by thread: Bug#576838: KVM: networking stack tanks after page allocation failure
Next by thread: Bug#576838: KVM: networking stack tanks after page allocation failure
Index(es):
- Date
- Thread