Bug#592187: Bug#576838: virtio network crashes again
Am Mittwoch, den 11.08.2010, 04:13 +0100 schrieb Ben Hutchings:
> On Mon, 2010-08-09 at 11:24 +0200, Lukas Kolbe wrote:
> > So, testing begins.
> >
> > First conclusion: not all traffic patterns produce the page allocation
> > failure. rdiff-backup only writing to an nfs-share does no harm;
> > rdiff-backup reading and writing (incremental backup) leads to (nearly
> > immediate) error.
> >
> > The nfs-share is always mounted with proto=tcp and nfsv3; /proc/mount says:
> > fileserver.backup...:/export/backup/lbork /.cbackup-mp nfs rw,relatime,vers=3,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,port=65535,timeo=600,retrans=2,sec=sys,mountport=65535,addr=x.x.x.x 0 0
> [...]
>
> I've seen some recent discussion of a bug in the Linux NFS client that
> can cause it to stop working entirely in case of some packet loss events
> <https://bugzilla.kernel.org/show_bug.cgi?id=16494>. It is possible
> that you are running into that bug. I haven't yet seen an agreement on
> the fix for it.
Thanks, I'll look into it. I ran some further tests with vanilla and
debian kernels:
VERSION WORKING
---------------------------
2.6.35 yes
2.6.33.6 yes
2.6.32.17 doesn't boot as kvm guest
2.6.32.17-2.6.32-19 no
2.6.32.17-2.6.32-18 no
2.6.32.16 no
I don't know if this is related to #16494 since I'm unable to trigger it
on 2.6.33.6 or 2.6.35. I'll test 2.6.32 with the patch from
http://lkml.org/lkml/2010/8/10/52 applied as well and bisect between
2.6.32.17 and 2.6.33.6 in the next few days.
> I also wonder whether the extremely large request sizes (rsize and
> wsize) you have selected are more likely to trigger the allocation
> failure in virtio_net. Please can you test whether reducing them helps?
The large rsize/wsize were automatically chosen, but I'll test with a
failing kernel and [rw]size of 32768.
Kind regards,
Lukas
Reply to: