[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#576838: virtio network crashes again



Am Samstag, den 07.08.2010, 12:18 +0100 schrieb Ben Hutchings:
> On Sat, 2010-08-07 at 11:21 +0200, Lukas Kolbe wrote:
> > Hi,
> > 
> > I sent this earlier today but the bug was archived so it didn't appear
> > anywhere, hence the resend.
> > 
> > I believe this issue is not fixed at all in 2.6.32-18. We have seen this
> > behaviour in various kvm guests using virtio_net with the same kernel in
> > the guest only minutes after starting the nightly backup (rdiff-backup
> > to an nfs-volume on a remote server), eventually leading to a
> > non-functional network. Often, the machines even do not reboot and hang
> > instead. Using the rtl8139 instead of virtio helps, but that's really
> > only a clumsy workaround.
> [...]
> 
> I think you need to give your guests more memory.

They all have between 512M and 2G - and it happens to all of them using
virtio_net, and none of them using rtl8139 as a network driver,
reproducibly. I would be delighted if it was as simple as giving them
more RAM, but sadly it isn't.

This is how we start the guests:

#!/bin/bash

KERNEL=2.6.32-5-amd64
NAME=tin

kvm -smp 2 \
 -drive if=virtio,file=/dev/system/tin_root,cache=off,boot=on \
 -drive if=virtio,file=/dev/system/tin_log,cache=off,boot=off \
 -drive if=virtio,file=/dev/system/tin_swap,cache=off,boot=off \
 -drive if=virtio,file=/dev/system/tin_data,cache=off,boot=off \
 -m 1024 \
 -nographic \
 -daemonize \
 -name ${NAME} \
 -kernel /boot/kvm/${NAME}/vmlinuz-${KERNEL} \
 -initrd /boot/kvm/${NAME}/initrd.img-${KERNEL} \
 -append "root=/dev/vda ro console=ttyS0,115200" \
 -serial mon:unix:/etc/kvm/consoles/${NAME}.sock,server,nowait \
 -net nic,macaddr=00:1A:4A:00:8E:3c,model=rtl8139 \
 -net tap,script=/etc/kvm/kvm-ifup-vlan142

Change model=rtl8139 to virtio, and the next time rdiff-backup runs, the
network stops working and eventually the guest hangs/can't be halted
anymore after a while.
qemu-kvm is version 0.12.4+dfsg-1, kernel is 2.6.32-18 on both host and
guest. And the page allocation failures look suspiciously similar to the
ones the original bug reporter saw when using 2.6.32-12.

If it would be an OOM situation, wouldn't the OOM-killer be supposed to
kick in?

/proc/meminfo on the host:
sajama:~# cat /proc/meminfo
MemTotal:        8197652 kB
MemFree:         2444964 kB
Buffers:           13560 kB
Cached:           128812 kB
SwapCached:         6892 kB
Active:          5102584 kB
Inactive:         316616 kB
Active(anon):    5035456 kB
Inactive(anon):   242180 kB
Active(file):      67128 kB
Inactive(file):    74436 kB
Unevictable:           0 kB
Mlocked:               0 kB
SwapTotal:       8388600 kB
SwapFree:        8355640 kB
Dirty:                 8 kB
Writeback:             0 kB
AnonPages:       5271936 kB
Mapped:             5892 kB
Shmem:               804 kB
Slab:              79844 kB
SReclaimable:      21184 kB
SUnreclaim:        58660 kB
KernelStack:        1880 kB
PageTables:        14256 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:    12487424 kB
Committed_AS:    6440192 kB
VmallocTotal:   34359738367 kB
VmallocUsed:      305788 kB
VmallocChunk:   34359332988 kB
HardwareCorrupted:     0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
DirectMap4k:        7872 kB
DirectMap2M:     8380416 kB

/proc/meminfo on the guest (currently using rtl8139 as a network model):
linadm@tin:~$ cat /proc/meminfo 
MemTotal:        1027200 kB
MemFree:           84336 kB
Buffers:           99588 kB
Cached:           152592 kB
SwapCached:         3160 kB
Active:           370304 kB
Inactive:         401924 kB
Active(anon):     264088 kB
Inactive(anon):   256724 kB
Active(file):     106216 kB
Inactive(file):   145200 kB
Unevictable:           0 kB
Mlocked:               0 kB
SwapTotal:       4194296 kB
SwapFree:        4175892 kB
Dirty:                16 kB
Writeback:             0 kB
AnonPages:        517608 kB
Mapped:            31348 kB
Shmem:               764 kB
Slab:             147396 kB
SReclaimable:     140440 kB
SUnreclaim:         6956 kB
KernelStack:        1472 kB
PageTables:         9948 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:     4707896 kB
Committed_AS:     893160 kB
VmallocTotal:   34359738367 kB
VmallocUsed:        9096 kB
VmallocChunk:   34359724404 kB
HardwareCorrupted:     0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
DirectMap4k:        8180 kB
DirectMap2M:     1040384 kB


-- 
Lukas





Reply to: