[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#628690: openssh-server inherits oom_adj -17 upon start under specific conditions, causing DoS potential for oom_killer



Package: openssh-server
Version: 1:5.5p1-6-amd64

Full discussion about this problem (including full log dumps, analysis, testing etc), can be found at:

http://www.debianhelp.org/content/cgroup-oom-killer-loop-causes-system-lockup-possible-fix-included

This problem seems to occur when using the firmware-bnx2 package for the NetXTreme II cards. Once this firmware has been loaded, the openssh-server will start with a default oom_adj of -17, which then causes all child processes within ssh to have -17.

Devs at OpenSSH have said this is not a bug within openssh, and they are simply inheriting whatever default oom_adj is sent across. You can see on the below testing (one with bnx2, one without bnx2, under identical installs), that the oom_adj is different:

> root@vicky:~# cat /var/log/auth.log | grep "Set"
> May 30 21:41:05 vicky sshd[1568]: Set /proc/self/oom_adj from -17 to -17
> May 30 21:41:07 vicky sshd[1574]: Set /proc/self/oom_adj to -17
>
> root@vicky:~# ps faux | grep 1574
> root 1574 0.0 0.0 70488 3404 ? Ss 21:41 0:00 \_ sshd: root@pts/1
>
> root@vicky:~# ps faux | grep "1568"
> root 1568 0.0 0.0 49168 1152 ? Ss 21:41 0:00 /usr/sbin/sshd
>
> In sshd.c there seems to be:
> static int oom_adj_save = INT_MIN;
>
> root@courtney:~/openssh-5.5p1# grep -R "Set %s to %d" .
> ./openbsd-compat/port-linux.c: verbose("Set %s to %d", OOM_ADJ_PATH, oom_adj_save);
>
> Then I tried on a server with different network card hardware (as shown below), and got this from the logs:
>
> root@courtney:~/openssh-5.5p1# cat /var/log/auth.log  | grep "Set"
> May 30 21:50:15 courtney sshd[4821]: Set /proc/self/oom_adj from 0 to -17
> May 30 21:50:26 courtney sshd[4848]: Set /proc/self/oom_adj to 0
>
> root@courtney:~/openssh-5.5p1# ps faux | grep "4848"
> root 4848 0.0 0.0 70488 3372 ? Ss 21:50 0:00 \_ sshd: root@pts/1
>
> root@courtney:~/openssh-5.5p1# ps faux | grep "4821"
> root 4821 0.0 0.0 49168 1160 ? Ss 21:50 0:00 /usr/sbin/sshd
>
> root@courtney:~/openssh-5.5p1# cat /var/log/auth.log | grep -e "Set" -e "oom_adjust_restore"
> May 30 21:50:15 courtney sshd[4821]: Set /proc/self/oom_adj from 0 to -17
> May 30 21:50:26 courtney sshd[4848]: debug3: oom_adjust_restore
> May 30 21:50:26 courtney sshd[4848]: Set /proc/self/oom_adj to 0

The problem is, you can't test for this condition unless you physically have a bnx2 card installed, with the bnx2 drivers loaded (which you need the firmware iso for), so this will make it very hard for someone to confirm this bug.

Below is a full transcript from openssh:

So I modified the code to try and repair this oom_adj problem...

port-linux.c:
line 235: //static int oom_adj_save = INT_MIN;
line 236: static int oom_adj_save = 0;
line 277: verbose("Set %s to %d - sleepycal", OOM_ADJ_PATH, oom_adj_save);


I then ran compiled the package, ran SSHd, and yet we still have -17 in oom_adj_save. Wtf? Now, I'm not much of a C coder, but this is weird even in my books...

May 30 22:18:19 vicky sshd[12825]: Set /proc/self/oom_adj to -17 - sleepycal

So, I went all out crazy, and did the following patch:

        static int sleepycal_oom_adj_save = 0;
        verbose("sleepycal_oom_adj_save=%d", sleepycal_oom_adj_save);

        if (fprintf(fp, "%d\n", sleepycal_oom_adj_save) <= 0)
verbose("error writing %s: %s", OOM_ADJ_PATH, strerror(errno));
        else
verbose("Set %s to %d - sleepycal", OOM_ADJ_PATH, sleepycal_oom_adj_save);

And it worked!!! :)

May 30 22:27:12 vicky sshd[2532]: sleepycal_oom_adj_save=0
May 30 22:27:12 vicky sshd[2532]: Set /proc/self/oom_adj to 0 - sleepycal

root@vicky:~/openssh-5.5p1# cat /proc/2532/oom_adj
0

So, it turns out that it is actually OpenSSH which is broken, after almost 3 days of frustrating digging through millions of lines of code lol. Anyways, would appreciate if someone could get this merged into master (obv rename the vars if you want).

Attached is the appropriate patch file as of openssh-5.5p1

Cal

On 30/05/2011 21:56, Cal Leeming [Simplicity Media Ltd] wrote:
>  Just did some testing..
>
> root@vicky:~# cat /var/log/auth.log | grep "Set"
> May 30 21:41:05 vicky sshd[1568]: Set /proc/self/oom_adj from -17 to -17
> May 30 21:41:07 vicky sshd[1574]: Set /proc/self/oom_adj to -17
>
> root@vicky:~# ps faux | grep 1574
> root 1574 0.0 0.0 70488 3404 ? Ss 21:41 0:00 \_ sshd: root@pts/1
>
> root@vicky:~# ps faux | grep "1568"
> root 1568 0.0 0.0 49168 1152 ? Ss 21:41 0:00 /usr/sbin/sshd
>
> In sshd.c there seems to be:
> static int oom_adj_save = INT_MIN;
>
> root@courtney:~/openssh-5.5p1# grep -R "Set %s to %d" .
> ./openbsd-compat/port-linux.c: verbose("Set %s to %d", OOM_ADJ_PATH, oom_adj_save);
>
> Then I tried on a server with different network card hardware (as shown below), and got this from the logs:
>
> root@courtney:~/openssh-5.5p1# cat /var/log/auth.log  | grep "Set"
> May 30 21:50:15 courtney sshd[4821]: Set /proc/self/oom_adj from 0 to -17
> May 30 21:50:26 courtney sshd[4848]: Set /proc/self/oom_adj to 0
>
> root@courtney:~/openssh-5.5p1# ps faux | grep "4848"
> root 4848 0.0 0.0 70488 3372 ? Ss 21:50 0:00 \_ sshd: root@pts/1
>
> root@courtney:~/openssh-5.5p1# ps faux | grep "4821"
> root 4821 0.0 0.0 49168 1160 ? Ss 21:50 0:00 /usr/sbin/sshd
>
> root@courtney:~/openssh-5.5p1# cat /var/log/auth.log | grep -e "Set" -e "oom_adjust_restore"
> May 30 21:50:15 courtney sshd[4821]: Set /proc/self/oom_adj from 0 to -17
> May 30 21:50:26 courtney sshd[4848]: debug3: oom_adjust_restore
> May 30 21:50:26 courtney sshd[4848]: Set /proc/self/oom_adj to 0
>
>
>
>
> On 30/05/2011 21:30, Cal Leeming [Simplicity Media Ltd] wrote:
>> Hi all,
>>
>> Please find below a complete transcript of the emails between debian/kernel-mm mailing lists.
>>
>> I've had a response back from someone on the deb mailing list stating:
>>
>> ====================================
>> The bug seems to be that sshd does not reset the OOM adjustment before
>> running the login shell (or other program).  Therefore, please report a
>> bug against openssh-server.
>> ====================================
>>
>> Therefore, I am submitting this bug to you also.. If someone would be kind enough to have a flick thru all the below debug/logs, it'd be very much appreciated.
>>
>> Cal



========================================================
Hi,

On Mon, May 30, 2011 at 10:32:24PM +0100, Cal Leeming [Simplicity Media Ltd] wrote:
> So, it turns out that it is actually OpenSSH which is broken, after

I would not second this.  To me, this very much looks like:

> On 30/05/2011 21:56, Cal Leeming [Simplicity Media Ltd] wrote:
> > Just did some testing..
> >
> >root@vicky:~# cat /var/log/auth.log | grep "Set"
> >May 30 21:41:05 vicky sshd[1568]: Set /proc/self/oom_adj from -17 to -17
> >May 30 21:41:07 vicky sshd[1574]: Set /proc/self/oom_adj to -17

... it's reading out the old value, saving it, setting it to "-17" (for
the sshd listener process, that one is not to be killed), and later on
*restoring* the old value (for all child processes).  See the comments
in platform.c

The log messages look weird because the value is -17 already when sshd
starts - so it's adjusting "-17 to -17" and later on "restoring -17" -
looks stupid, but that's computers for you.  But what it tells you is
that the value isn't set by sshd to "-17" but that sshd inherited that
from whoever started it.

The question here is why sshd is sometimes started with -17 and sometimes
with 0 - that's the bug, not that sshd keeps what it's given.

(Ask yourself: if sshd had no idea about oom_adj at all, would this make
it buggy by not changing the value?)


Anyway, as a workaround for your system, you can certainly set

 oom_adj_save = 0;

in the beginning of port-linux.c / oom_adjust_restore(), to claim that
"hey, this was the saved value to start with" and "restore" oom_adj to 0
then - but that's just hiding the bug, not fixing it.

gert
--
USENET is *not* the non-clickable part of WWW!
//www.muc.de/~gert/ Gert Doering - Munich, Germany gert@greenie.muc.de fax: +49-89-35655025 gert@net.informatik.tu-muenchen.de





Reply to: