[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: cgroup OOM killer loop causes system to lockup (possible fix included)



FYI everyone, I found a bug within openssh-server which caused this problem.

I've patched and submitted to the openssh list.

You can find details of this by googling for:
"port-linux.c bug with oom_adjust_restore() - causes real bad oom_adj - which can cause DoS conditions"

It's extremely strange.. :S

Cal

On 30/05/2011 21:31, Cal Leeming [Simplicity Media Ltd] wrote:
Thanks for the response. I have sent this across to the guys at openssh-server.

Although, I did check the openssh source code myself, and from what I could tell, everything was being done correctly.

I have a feeling there gonna be a lot of 'buck passing' on this one :(

Cal

On 30/05/2011 21:25, Ben Hutchings wrote:
On Mon, 2011-05-30 at 21:03 +0100, Cal Leeming [Simplicity Media Ltd]
wrote:
More strangeness..

If I keep the kernel module loaded, but disable the entry
in /etc/network/interfaces for eth0, the oom_adj problem disappears.
But then ofc, I'm left with no network interface. I then tried
binding /etc/ssh/sshd_config to only listen on 127.0.0.1.. effectively
bypassing the eth0 interface, whilst still allowing it to be loaded.
But the problem still happens.
[...]

My guess is that sshd tries to protect itself against the OOM-killer so
that you can still log in to a system that has gone OOM.  If there is no
network available, it doesn't do this because you cannot log in remotely
anyway.

The bug seems to be that sshd does not reset the OOM adjustment before
running the login shell (or other program).  Therefore, please report a
bug against openssh-server.

Ben.




Reply to: