[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

SOLVED: Re: no buffer space available



On 3/20/07, Timur Irmatov <irmatov@gmail.com> wrote:
The machines have identical hardware: nvidia-based motherboard (some
desktop shit, nobody asked us when hardware were bought), realtek 8139
network card, amd 64 dual core processor, 2 Gigs of RAM, SATA on board
(nvidia mcp51). Each has about 50 network interfaces (vlan), with /24
private networks on each. Services include dhcp, bind, pppoe in kernel
mode.

Number of PPPoE interfaces is about 500 on each, with 10 Mbit/s
average traffic. They are running fine for some time (week or two),
then following lines appear in system log:

Mar 20 11:45:57 pppoe1 named[4267]: client 10.1.67.154#1049: error
sending response: not enough free resources
Mar 20 11:45:58 pppoe1 named[4267]: client 10.1.55.135#1164: error
sending response: not enough free resources

Also, kernel writes:

Mar 20 11:49:45 pppoe1 kernel: Neighbour table overflow.
Mar 20 11:49:50 pppoe1 kernel: printk: 26 messages suppressed.

When i try to do a 'ping localhost' or 'telnet localhost 22' these
commands most of time fail with error 'no buffer space available'.
Strace shows that 'connect' system call fails with ENOBUFS error code.
After several attempts command may succeed but then again fail. What's
strange, when i try to ping some neighbour routers, ping and telnet
work at every attempt.

I have googled a lot, but have not found anything useful - most posts
are about freebsd, when these happens on linux machines some suggest
that there may be a problem with loopback not configured (that is not
the case.. btw, on loopback interface there is additional real
ip-address serving as server side of pppoe connections), or some bad
network cards.

When problem appears, only reboot fixes it. I tried to shutdown all
processes except sshd in hope that some process has associated kernel
structures that can be freed after process shutdown. No luck.

At this time, I suspect that this is kernel issue (may be specific for
our unfortunate hardware).

Recently I noticed this comment in Changelog of 2.6.20.5 kernel:

Author: G. Liakhovetski <gl@dsa-ac.de>
Date:   Mon Mar 26 19:07:40 2007 -0700

   PPP: Fix PPP skb leak

   [PPP]: Don't leak an sk_buff on interface destruction.

It seems that I was bitten by exactly this bug. I have upgraded
kernels on both machines to 2.6.20.6, now uptime on one of them is 9
days, which was long enough to manifest problem with "no buffer space
available". Machines work fine, I hope that this completely solves
this issue.

Just wanted you all to know.

--
Timur Irmatov, xmpp:irmatov@jabber.ru



Reply to: