[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Need help analyzing (kernel?) memory usage and reclaiming RAM (Debian Stretch)



	Hi.

On Mon, Apr 15, 2019 at 04:40:56PM +0200, Martin Schwarz wrote:
> The system from my previous example has already been rebooted, sorry!

Kind of expected. It's useful nevertheless.


> But here's from another system that currently starts showing the same
> problem and has an equally small workload:
> 
> root@rad-wgv-srv01:~# free -thwl

Nothing out of the ordinary here.

> root@rad-wgv-srv01:~# cat /proc/meminfo
> MemTotal:        1010976 kB
> MemFree:           73980 kB
> MemAvailable:      38756 kB
> Buffers:            9964 kB
> Cached:            50340 kB

It's not the file cache who ate the memory.

> SwapCached:         2728 kB

And it's not the swap caching.

> Active(anon):      11068 kB
> Inactive(anon):     3696 kB

Memory consumption cannot be attributed to tmpfs.
I know, you've posted 'df' output earlier, but it does not take mount
namespaces into the account.

> Mapped:            19904 kB

To my biggest disappointment, the problem cannot be explained by
excessive use of mmap(2) syscall. Would be easy otherwise.


> Shmem:              1120 kB

It's not the shared memory segments.


> Slab:              90744 kB
> SReclaimable:      13100 kB
> SUnreclaim:        77644 kB

And it's not dentries cache (saw the thing grown once or twice. was
ugly).


> AnonHugePages:         0 kB
> ShmemHugePages:        0 kB
> ShmemPmdMapped:        0 kB
> HugePages_Total:       0
> HugePages_Free:        0
> HugePages_Rsvd:        0
> HugePages_Surp:        0

And last, but not the least, there are no hugepages in use.


> root@rad-wgv-srv01:~# smem -tm | tail
> /bin/bash                                    3      358     1076 
> /lib/systemd/systemd                         3      386     1158 
> /lib/x86_64-linux-gnu/libc-2.24.so          33       54     1783 
> /usr/lib/x86_64-linux-gnu/libcrypto.so.1     5      386     1933 
> /usr/bin/python2.7                           1     2220     2220 
> /lib/systemd/libsystemd-shared-232.so        5      544     2723 
> <anonymous>                                 33      146     4848 
> [heap]                                      33      304    10060 
> -----------------------------------------------------------------
> 179                                        922    11110    41011 

Moreover, no current running visible process consume the memory.
I suspect that this host does not utilize them anyway.


In short. I do believe that this is happening, but I never seen anything
like this. I cannot imagine the scenario that can lead to this, as long
as we're talking real hardware aka big iron.

What I suspect is happening here is runaway memory allocation by a
kernel module (at least one of them), and said kernel module is likely
to be VMWare-specific.
It could be vmxnet3 (network). It could be that LSI kernel module or
whatever they're using for SCSI these days (vmw_pvscsi?).


And that means - 'perf top', or better yet - 'perf record'.

Reco


Reply to: