[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#851471: linux-image-4.8.0-2-686-pae: premature oom killer invocation running 4.8.15



On Sun 2017-01-15, Ben Hutchings <ben@decadent.org.uk> wrote:
> On Sun, 2017-01-15 at 11:05 +0100, Florian Siegesmund wrote:

>> Package: src:linux
>> Version: 4.8.15-2
>> Severity: critical
>> Tags: upstream
>> Justification: breaks the whole system
>>
>> [...]
>>
>> It seems to me that commit 7838fbe25a95ce2cd6e8ae27a76d369365da89d4
>> (kernel ChangeLog-4.8.12) was meant to deal with this problem, but it
>> doesn't work for me.
>>
>> I will add a sample syslog for this issue.
> 
> That includes an RCU error before the OOM.  Was that the first message
> relating to RCU?

No.

LOGHOST:/var/log # grep -i 'LXCONTAINER.*RCU' syslog*
syslog:Jan 15 02:03:03 LXCONTAINER kernel: [38786.739883] INFO: rcu_sched detected stalls on CPUs/tasks:                   
syslog:Jan 15 02:03:03 LXCONTAINER kernel: [38786.739883] All QSes seen, last rcu_sched kthread activity 5260 (9621692-9616432), jiffies_till_next_fqs=1, root ->qsmask 0x0
syslog:Jan 15 02:03:03 LXCONTAINER kernel: [38786.739883]  [<cb0d0653>] ? rcu_check_callbacks+0x773/0x790                                                                          
syslog:Jan 15 02:03:03 LXCONTAINER kernel: [38786.739883] rcu_sched kthread starved for 5260 jiffies! g1427280 c1427279 f0x2 RCU_GP_WAIT_FQS(3) ->state=0x0                                             
syslog:Jan 15 02:03:03 LXCONTAINER kernel: [38786.739883] rcu_sched       R running      0     7      2 0x00000000
syslog:Jan 15 02:03:03 LXCONTAINER kernel: [38786.739883]  [<cb0cedc2>] ? rcu_gp_kthread+0x4a2/0x7d0
syslog:Jan 15 02:03:03 LXCONTAINER kernel: [38786.739883]  [<cb0ce920>] ? rcu_note_context_switch+0x100/0x100
syslog:Jan 15 02:14:13 LXCONTAINER kernel: [39471.046708] INFO: rcu_sched detected stalls on CPUs/tasks:
syslog:Jan 15 02:14:13 LXCONTAINER kernel: [39471.046708] All QSes seen, last rcu_sched kthread activity 5253 (9792768-9787515), jiffies_till_next_fqs=1, root ->qsmask 0x0
syslog:Jan 15 02:14:13 LXCONTAINER kernel: [39471.046708]  [<cb0d0653>] ? rcu_check_callbacks+0x773/0x790
syslog:Jan 15 02:14:13 LXCONTAINER kernel: [39471.046708] rcu_sched kthread starved for 5253 jiffies! g1435292 c1435291 f0x2 RCU_GP_WAIT_FQS(3) ->state=0x0
syslog:Jan 15 02:14:13 LXCONTAINER kernel: [39471.046708] rcu_sched       R running      0     7      2 0x00000000
syslog:Jan 15 02:14:13 LXCONTAINER kernel: [39471.046708]  [<cb0cedc2>] ? rcu_gp_kthread+0x4a2/0x7d0
syslog:Jan 15 02:14:13 LXCONTAINER kernel: [39471.046708]  [<cb0ce920>] ? rcu_note_context_switch+0x100/0x100
syslog:Jan 15 02:21:12 LXCONTAINER kernel: [39861.120914] INFO: rcu_sched detected stalls on CPUs/tasks:
syslog:Jan 15 02:21:12 LXCONTAINER kernel: [39861.120914] All QSes seen, last rcu_sched kthread activity 5254 (9890282-9885028), jiffies_till_next_fqs=1, root ->qsmask 0x0
syslog:Jan 15 02:21:12 LXCONTAINER kernel: [39861.120914]  [<cb0d0653>] ? rcu_check_callbacks+0x773/0x790
syslog:Jan 15 02:21:12 LXCONTAINER kernel: [39861.120914] rcu_sched kthread starved for 5254 jiffies! g1443629 c1443628 f0x2 RCU_GP_WAIT_FQS(3) ->state=0x0
syslog:Jan 15 02:21:12 LXCONTAINER kernel: [39861.120914] rcu_sched       R running      0     7      2 0x00000000
syslog:Jan 15 02:21:12 LXCONTAINER kernel: [39861.120914]  [<cb0cedc2>] ? rcu_gp_kthread+0x4a2/0x7d0
syslog:Jan 15 02:21:12 LXCONTAINER kernel: [39861.120914]  [<cb0ce920>] ? rcu_note_context_switch+0x100/0x100
LOGHOST:/var/log # grep -i 'HOSTNAME.*RCU' syslog*
syslog:Jan 15 02:24:30 HOSTNAME kernel: [40082.167882] INFO: rcu_sched detected stalls on CPUs/tasks:
syslog:Jan 15 02:24:30 HOSTNAME kernel: [40082.167882] All QSes seen, last rcu_sched kthread activity 5258 (9945547-9940289), jiffies_till_next_fqs=1, root ->qsmask 0x0
syslog:Jan 15 02:24:30 HOSTNAME kernel: [40082.167882]  [<cb0d0653>] ? rcu_check_callbacks+0x773/0x790
syslog:Jan 15 02:24:30 HOSTNAME kernel: [40082.167882] rcu_sched kthread starved for 5258 jiffies! g1445441 c1445440 f0x2 RCU_GP_WAIT_FQS(3) ->state=0x0
syslog:Jan 15 02:24:30 HOSTNAME kernel: [40082.167882] rcu_sched       R running      0     7      2 0x00000000
syslog:Jan 15 02:24:30 HOSTNAME kernel: [40082.167882]  [<cb0cedc2>] ? rcu_gp_kthread+0x4a2/0x7d0
syslog:Jan 15 02:24:30 HOSTNAME kernel: [40082.167882]  [<cb0ce920>] ? rcu_note_context_switch+0x100/0x100
syslog:Jan 15 07:36:36 HOSTNAME kernel: [58814.608747] INFO: rcu_sched detected stalls on CPUs/tasks:
syslog:Jan 15 07:36:36 HOSTNAME kernel: [58814.608747] All QSes seen, last rcu_sched kthread activity 5254 (14628655-14623401), jiffies_till_next_fqs=1, root ->qsmask 0x0
syslog:Jan 15 07:36:36 HOSTNAME kernel: [58814.608747]  [<cb0d0653>] ? rcu_check_callbacks+0x773/0x790
syslog:Jan 15 07:36:36 HOSTNAME kernel: [58814.608747] rcu_sched kthread starved for 5254 jiffies! g1582915 c1582914 f0x2 RCU_GP_WAIT_FQS(3) ->state=0x0
syslog:Jan 15 07:36:36 HOSTNAME kernel: [58814.608747] rcu_sched       R running      0     7      2 0x00000000
syslog:Jan 15 07:36:36 HOSTNAME kernel: [58814.608747]  [<cb0cedc2>] ? rcu_gp_kthread+0x4a2/0x7d0
syslog:Jan 15 07:36:36 HOSTNAME kernel: [58814.608747]  [<cb0ce920>] ? rcu_note_context_switch+0x100/0x100
LOGHOST:/var/log # 

Oldest syslog message in the archive is from January 8th, 00:00:00 local
time, so no RCU related messages when running 3.16.x.

HOSTNAME:~# last | grep boot
reboot   system boot  4.9.0-1-686-pae  Sun Jan 15 11:42   still running
reboot   system boot  4.8.0-2-686-pae  Sun Jan 15 10:16 - 11:40  (01:23)
reboot   system boot  3.16.0-4-686-pae Sun Jan 15 08:44 - 10:15  (01:31)
reboot   system boot  4.8.0-2-686-pae  Sat Jan 14 15:16 - 10:15  (18:59)
reboot   system boot  3.16.0-4-686-pae Fri Jan  6 23:58 - 15:15 (7+15:17)
HOSTNAME:~#

FWIW RCU related messages are logged by the LXC container running on
HOSTNAME first.

> Linux 4.9 is said to have better OOM behaviour and is now available in
> unstable.  Can you test whether it works better on this system?

We will see:

HOSTNAME:~# uname -a ; uptime
Linux kyrrdis 4.9.0-1-686-pae #1 SMP Debian 4.9.2-2 (2017-01-12) i686 GNU/Linux
 22:35:41 up 10:53,  2 users,  load average: 0.28, 0.11, 0.04
HOSTNAME:~#

Florian

-- 
:wq


Reply to: