[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Out of memory killer misconfigured?



Hello,

I use Debian Testing on AMD64, on a workstation with Ryzen 5800X - 16
CPU cores and 64GB of ECC DDR4 RAM.

Today, Windows application I run on Wine for work has decided to eat all
available memory, CPU and HDD I/O. I don't have swapfile, so Linux
kernel must kill something to remain online when all RAM is taken by
rogue application.
That's where problem I noticed comes in - Debian oom-kill has killed
EVERYTHING and actual offending memory hungry application at the end.
Why?! It destroyed working KDE session and I had to hard reset the PC.

Have a look at journalctl results from last boot (I cut timestamps for
easier reading):

kernel: RSP: 002b:00007ffcff9ead98 EFLAGS: 00010246
systemd-journald[411]: Missed 10 kernel messages
kernel: lowmem_reserve[]: 0 3128 64155 64155 64155
kernel: Node 0 DMA32 free:246472kB boost:0kB min:3292kB low:6492kB
high:9692kB reserved_highatomic:0KB active_anon:44kB
inactive_anon:3057032kB active_file:0kB inactive_file:220kB un>
kernel: lowmem_reserve[]: 0 0 61027 61027 61027
kernel: Node 0 Normal free:245324kB boost:283884kB min:348156kB
low:410648kB high:473140kB reserved_highatomic:2048KB
active_anon:270712kB inactive_anon:60254116kB active_file:29564k>
kernel: lowmem_reserve[]: 0 0 0 0 0

(...)

Mar 29 08:58:28 ryzen kernel: 539654 total pagecache pages
Mar 29 08:58:28 ryzen kernel: 0 pages in swap cache
Mar 29 08:58:28 ryzen kernel: Swap cache stats: add 0, delete 0, find 0/0
Mar 29 08:58:28 ryzen kernel: Free swap  = 0kB
Mar 29 08:58:28 ryzen kernel: Total swap = 0kB
Mar 29 08:58:28 ryzen kernel: 16753821 pages RAM
Mar 29 08:58:28 ryzen kernel: 0 pages HighMem/MovableOnly
Mar 29 08:58:28 ryzen kernel: 296896 pages reserved
Mar 29 08:58:28 ryzen kernel: 0 pages hwpoisoned

And here we have all processes running, let me only highlight a few:

kernel: Tasks state (memory values in pages):
kernel: [  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents
oom_score_adj name
kernel: [   4611]  1000  4611   137776     1767   253952        0
    200 kactivitymanage
kernel: [ 676751]  1000 676751  2356154   115060  2740224        0
       0 terminal64.exe
kernel: [ 702184]  1000 702184  1226983   824654  7540736        0
       0 metatester64.ex
kernel: [ 731468]  1000 731468  1194211   814761  7442432        0
       0 metatester64.ex
kernel: [ 731471]  1000 731471  1245415   835020  7593984        0
       0 metatester64.ex
(and it goes on, at least 16 Wine exe processes like that eating all RAM)

As you can see, my Wine application has spawned a lot of exes, each one
of them uses around 7.5 million pagetables of memory (I am not sure what
is pagetable size in bytes in my Debian), and there are several of such
processes. But instead of killing one, or a few of these processes, OOM
manager has decided to kill everything *but* the offending exes. Killing
of all processes has begun:

kernel:
oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/user.slice/user-1000.slice/user@1000.service/background.slice/plasma-kactiv>
kernel: Out of memory: Killed process 4611 (kactivitymanage)
total-vm:551104kB, anon-rss:7068kB, file-rss:0kB, shmem-rss:0kB,
UID:1000 pgtables:248kB oom_score_adj:200

Instead of killing ONE 7.5 million-worth pagetable process, Linux is
killing everything else! KDE activity manager killed. Then it goes on to
kill EVERYTHING in the system:

kernel: Out of memory: Killed process 4555 (kglobalaccel5)
kernel: Out of memory: Killed process 444878 (kiod5)
kernel: oom_reaper: reaped process 444878 (kiod5)
kernel: Out of memory: Killed process 4405 (pipewire)
kernel: oom_reaper: reaped process 4405 (pipewire)
kernel: Out of memory: Killed process 505026 (gvfs-udisks2-vo)
(it goes on...)
Out of memory: Killed process 4414 (dbus-daemon)
(...)
Out of memory: Killed process 4390 (systemd)

And behold, at the end it kills Wine process:
Out of memory: Killed process 731550 (metatester64.ex)
total-vm:4891544kB, anon-rss:3483116kB, file-rss:0kB, shmem-rss:0kB,
UID:1000 pgtables:7704kB oom_score_adj:0

It even says total-vm:4891544kB, but just before that it killed systemd
with total-vm:18760kB.

At this stage, system is completely crashed and I have to hard reset.

I'd appreciate any explanation to this situation and how to prevent it
in the future.
Please find journalctl result as compressed attachment (16 KB).

I didn't modified Debian in any way which can affect RAM and out of
memory situations, apart from increasing I/O buffers for better
performance (comments to changes are my own):

$ cat /etc/sysctl.conf
(...)
vm.dirty_background_ratio=20
# Writing starts after 20% of RAM is filled with data to write.

vm.dirty_ratio=40
# up to 40% of memory can be used as write buffers (more write requests
will cause I/O lock until enough data is flushed).

vm.dirty_expire_centisecs=30000
# data is allowed to sit in the buffers for max 5 minutes (max lost work
time)

vm.dirty_writeback_centisecs=6000
# how often to check for write data in buffers: 1 minute

Not sure if that causes OOM to kill entire system instead of one
offending process, I doubt it.

Thanks in advance for your comments friends!

--
With kindest regards, Piotr.

⢀⣴⠾⠻⢶⣦⠀
⣾⠁⢠⠒⠀⣿⡁ Debian - The universal operating system
⢿⡄⠘⠷⠚⠋⠀ https://www.debian.org/
⠈⠳⣄⠀⠀⠀⠀

Attachment: journalctl.tar.zst
Description: application/zstd


Reply to: