Re: Need help analyzing (kernel?) memory usage and reclaiming RAM (Debian Stretch)
Hi.
On Mon, Apr 15, 2019 at 04:40:56PM +0200, Martin Schwarz wrote:
> The system from my previous example has already been rebooted, sorry!
Kind of expected. It's useful nevertheless.
> But here's from another system that currently starts showing the same
> problem and has an equally small workload:
>
> root@rad-wgv-srv01:~# free -thwl
Nothing out of the ordinary here.
> root@rad-wgv-srv01:~# cat /proc/meminfo
> MemTotal: 1010976 kB
> MemFree: 73980 kB
> MemAvailable: 38756 kB
> Buffers: 9964 kB
> Cached: 50340 kB
It's not the file cache who ate the memory.
> SwapCached: 2728 kB
And it's not the swap caching.
> Active(anon): 11068 kB
> Inactive(anon): 3696 kB
Memory consumption cannot be attributed to tmpfs.
I know, you've posted 'df' output earlier, but it does not take mount
namespaces into the account.
> Mapped: 19904 kB
To my biggest disappointment, the problem cannot be explained by
excessive use of mmap(2) syscall. Would be easy otherwise.
> Shmem: 1120 kB
It's not the shared memory segments.
> Slab: 90744 kB
> SReclaimable: 13100 kB
> SUnreclaim: 77644 kB
And it's not dentries cache (saw the thing grown once or twice. was
ugly).
> AnonHugePages: 0 kB
> ShmemHugePages: 0 kB
> ShmemPmdMapped: 0 kB
> HugePages_Total: 0
> HugePages_Free: 0
> HugePages_Rsvd: 0
> HugePages_Surp: 0
And last, but not the least, there are no hugepages in use.
> root@rad-wgv-srv01:~# smem -tm | tail
> /bin/bash 3 358 1076
> /lib/systemd/systemd 3 386 1158
> /lib/x86_64-linux-gnu/libc-2.24.so 33 54 1783
> /usr/lib/x86_64-linux-gnu/libcrypto.so.1 5 386 1933
> /usr/bin/python2.7 1 2220 2220
> /lib/systemd/libsystemd-shared-232.so 5 544 2723
> <anonymous> 33 146 4848
> [heap] 33 304 10060
> -----------------------------------------------------------------
> 179 922 11110 41011
Moreover, no current running visible process consume the memory.
I suspect that this host does not utilize them anyway.
In short. I do believe that this is happening, but I never seen anything
like this. I cannot imagine the scenario that can lead to this, as long
as we're talking real hardware aka big iron.
What I suspect is happening here is runaway memory allocation by a
kernel module (at least one of them), and said kernel module is likely
to be VMWare-specific.
It could be vmxnet3 (network). It could be that LSI kernel module or
whatever they're using for SCSI these days (vmw_pvscsi?).
And that means - 'perf top', or better yet - 'perf record'.
Reco
Reply to: