Re: cgroup OOM killer loop causes system to lockup (possible fix included)

To: Ben Hutchings <ben@decadent.org.uk>
Cc: debian-kernel@lists.debian.org, debian-user@lists.debian.org
Subject: Re: cgroup OOM killer loop causes system to lockup (possible fix included)
From: "Cal Leeming [Simplicity Media Ltd]" <cal.leeming@simplicitymedialtd.co.uk>
Date: Mon, 30 May 2011 22:35:42 +0100
Message-id: <[🔎] 4DE40DAE.7090105@simplicitymedialtd.co.uk>
In-reply-to: <[🔎] 4DE3FEA9.3050105@simplicitymedialtd.co.uk>
References: <[🔎] 4DE2CB80.8040101@simplicitymedialtd.co.uk> <[🔎] 4DE2E3A5.4060208@simplicitymedialtd.co.uk> <[🔎] 4DE39710.4050707@simplicitymedialtd.co.uk> <4DE3A8DB.2000208@simplicitymedialtd.co.uk> <4DE3D5BF.3010707@simplicitymedialtd.co.uk> <4DE3EFB4.10305@simplicitymedialtd.co.uk> <4DE3F7F9.4010709@simplicitymedialtd.co.uk> <[🔎] 1306787120.4277.31.camel@localhost> <[🔎] 4DE3FEA9.3050105@simplicitymedialtd.co.uk>

FYI everyone, I found a bug within openssh-server which caused this problem.

I've patched and submitted to the openssh list.

You can find details of this by googling for:

"port-linux.c bug with oom_adjust_restore() - causes real bad oom_adj -which can cause DoS conditions"


It's extremely strange.. :S

Cal

On 30/05/2011 21:31, Cal Leeming [Simplicity Media Ltd] wrote:

Thanks for the response. I have sent this across to the guys atopenssh-server.

Although, I did check the openssh source code myself, and from what Icould tell, everything was being done correctly.


I have a feeling there gonna be a lot of 'buck passing' on this one :(

Cal

On 30/05/2011 21:25, Ben Hutchings wrote:

On Mon, 2011-05-30 at 21:03 +0100, Cal Leeming [Simplicity Media Ltd]
wrote:

More strangeness..

If I keep the kernel module loaded, but disable the entry
in /etc/network/interfaces for eth0, the oom_adj problem disappears.
But then ofc, I'm left with no network interface. I then tried
binding /etc/ssh/sshd_config to only listen on 127.0.0.1.. effectively
bypassing the eth0 interface, whilst still allowing it to be loaded.
But the problem still happens.

[...]

My guess is that sshd tries to protect itself against the OOM-killer so
that you can still log in to a system that has gone OOM.  If there is no
network available, it doesn't do this because you cannot log in remotely
anyway.

The bug seems to be that sshd does not reset the OOM adjustment before
running the login shell (or other program).  Therefore, please report a
bug against openssh-server.

Ben.

Reply to:

References:
- cgroup OOM killer loop causes system to lockup (possible fix included)
  - From: "Cal Leeming [Simplicity Media Ltd]" <cal.leeming@simplicitymedialtd.co.uk>
- Re: cgroup OOM killer loop causes system to lockup (possible fix included)
  - From: "Cal Leeming [Simplicity Media Ltd]" <cal.leeming@simplicitymedialtd.co.uk>
- Re: cgroup OOM killer loop causes system to lockup (possible fix included)
  - From: "Cal Leeming [Simplicity Media Ltd]" <cal.leeming@simplicitymedialtd.co.uk>
- Re: cgroup OOM killer loop causes system to lockup (possible fix included)
  - From: Ben Hutchings <ben@decadent.org.uk>
- Re: cgroup OOM killer loop causes system to lockup (possible fix included)
  - From: "Cal Leeming [Simplicity Media Ltd]" <cal.leeming@simplicitymedialtd.co.uk>

Prev by Date: Re: ia64 and i386
Next by Date: Re: How to use the net install CD with PPPoE?
Previous by thread: Re: cgroup OOM killer loop causes system to lockup (possible fix included)
Next by thread: How to use the net install CD with PPPoE?
Index(es):
- Date
- Thread