Bug#592187: Bug#576838: virtio network crashes again

To: Ben Hutchings <ben@decadent.org.uk>
Cc: 592187@bugs.debian.org
Subject: Bug#592187: Bug#576838: virtio network crashes again
From: Lukas Kolbe <lkolbe@techfak.uni-bielefeld.de>
Date: Wed, 11 Aug 2010 11:24:32 +0200
Message-id: <1281518672.11319.146.camel@larosa.fritz.box>
Reply-to: Lukas Kolbe <lkolbe@techfak.uni-bielefeld.de>, 592187@bugs.debian.org
In-reply-to: <1281496431.7543.476.camel@localhost>
References: <1281172902.7018.49.camel@larosa.fritz.box> <1281179915.7543.12.camel@localhost> <1281197867.11319.6.camel@larosa.fritz.box> <1281234965.7543.128.camel@localhost> <1281345882.11319.71.camel@larosa.fritz.box> <1281496431.7543.476.camel@localhost>

Am Mittwoch, den 11.08.2010, 04:13 +0100 schrieb Ben Hutchings:
> On Mon, 2010-08-09 at 11:24 +0200, Lukas Kolbe wrote:
> > So, testing begins.
> > 
> > First conclusion: not all traffic patterns produce the page allocation
> > failure. rdiff-backup only writing to an nfs-share does no harm;
> > rdiff-backup reading and writing (incremental backup) leads to (nearly
> > immediate) error.
> > 
> > The nfs-share is always mounted with proto=tcp and nfsv3; /proc/mount says:
> > fileserver.backup...:/export/backup/lbork /.cbackup-mp nfs rw,relatime,vers=3,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,port=65535,timeo=600,retrans=2,sec=sys,mountport=65535,addr=x.x.x.x 0 0
> [...]
> 
> I've seen some recent discussion of a bug in the Linux NFS client that
> can cause it to stop working entirely in case of some packet loss events
> <https://bugzilla.kernel.org/show_bug.cgi?id=16494>.  It is possible
> that you are running into that bug.  I haven't yet seen an agreement on
> the fix for it.

Thanks, I'll look into it. I ran some further tests with vanilla and
debian kernels:

VERSION             WORKING
---------------------------
2.6.35              yes
2.6.33.6            yes
2.6.32.17           doesn't boot as kvm guest
2.6.32.17-2.6.32-19 no
2.6.32.17-2.6.32-18 no
2.6.32.16           no

I don't know if this is related to #16494 since I'm unable to trigger it
on 2.6.33.6 or 2.6.35. I'll test 2.6.32 with the patch from
http://lkml.org/lkml/2010/8/10/52 applied as well and bisect between
2.6.32.17 and 2.6.33.6 in the next few days.

> I also wonder whether the extremely large request sizes (rsize and
> wsize) you have selected are more likely to trigger the allocation
> failure in virtio_net.  Please can you test whether reducing them helps?

The large rsize/wsize were automatically chosen, but I'll test with a
failing kernel and [rw]size of 32768.

Kind regards,
Lukas

Reply to:

Follow-Ups:
- Bug#592187: Bug#576838: virtio network crashes again
  - From: Lukas Kolbe <lkolbe@techfak.uni-bielefeld.de>

References:
- Bug#576838: virtio network crashes again
  - From: Lukas Kolbe <lkolbe@techfak.uni-bielefeld.de>
- Bug#576838: virtio network crashes again
  - From: Ben Hutchings <ben@decadent.org.uk>
- Bug#576838: virtio network crashes again
  - From: Lukas Kolbe <lkolbe@techfak.uni-bielefeld.de>
- Bug#592187: Bug#576838: virtio network crashes again
  - From: Ben Hutchings <ben@decadent.org.uk>
- Bug#592187: Bug#576838: virtio network crashes again
  - From: Lukas Kolbe <lkolbe@techfak.uni-bielefeld.de>
- Bug#592187: Bug#576838: virtio network crashes again
  - From: Ben Hutchings <ben@decadent.org.uk>

Prev by Date: Bug#592428: Fix 2.6.32 XEN guest on old buggy RHEL5/EC2 hypervisor (XSAVE)
Next by Date: Re: Single kernel variant: Missing linux-headers-*-common package
Previous by thread: Bug#592187: Bug#576838: virtio network crashes again
Next by thread: Bug#592187: Bug#576838: virtio network crashes again
Index(es):
- Date
- Thread