[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

unsure how to track down kernel stack traces in debian 9.2 on vmware ESXi



Hello dear Debian folks

 

We run a Debian 9.2 build server on top of a vmware ESXi install on a quite powerful server (Dell Poweredge R730 with 2x Xeon E5-2683v4 (16 cores per CPU makes 64 vCPUs with HT enabled). Bot installations are fully updated. Also Dell firmwares are up to date.

 

Now I do see stack traces in the Debian /var/log/messages (attached) file, but no time - corresponding entries in the underlying ESXi logs, so I tend to say it’s a Debian (or a kernel) problem. The traces occur under heavy load and the server stops to respond.

 

Unfortunately we’re evaluating vmware for this use-case so I cannot open a ticket there, as I’m running in eval mode. Its only one vm on this physical server. And no, it was not my idea to run it on vmware, I was told to do so.

 

I did run the open-vm-tools and tried with the vmware proprietary ones, no difference.

 

Linux hostname 4.9.0-4-amd64 #1 SMP Debian 4.9.51-1 (2017-09-28) x86_64 GNU/Linux

 

Any ideas what I can do? Any help would be greately appreciated

 

Best regards

 

Tom Stocker

 

More infos:

 

root@hostname:# cat /proc/meminfo

MemTotal:       121711504 kB

MemFree:         8760844 kB

MemAvailable:   119331608 kB

Buffers:        11026468 kB

Cached:         93449972 kB

SwapCached:         7924 kB

Active:         16845672 kB

Inactive:       87776128 kB

Active(anon):      73852 kB

Inactive(anon):   140096 kB

Active(file):   16771820 kB

Inactive(file): 87636032 kB

Unevictable:       91916 kB

Mlocked:           91916 kB

SwapTotal:      524287996 kB

SwapFree:       524234744 kB

Dirty:                48 kB

Writeback:             0 kB

AnonPages:        232092 kB

Mapped:           158928 kB

Shmem:             64996 kB

Slab:            7419836 kB

SReclaimable:    7161340 kB

SUnreclaim:       258496 kB

KernelStack:       14608 kB

PageTables:        30352 kB

NFS_Unstable:          0 kB

Bounce:                0 kB

WritebackTmp:          0 kB

CommitLimit:    585143748 kB

Committed_AS:    1587700 kB

VmallocTotal:   34359738367 kB

VmallocUsed:           0 kB

VmallocChunk:          0 kB

HardwareCorrupted:     0 kB

AnonHugePages:         0 kB

ShmemHugePages:        0 kB

ShmemPmdMapped:        0 kB

HugePages_Total:       0

HugePages_Free:        0

HugePages_Rsvd:        0

HugePages_Surp:        0

Hugepagesize:       2048 kB

DirectMap4k:     2940800 kB

DirectMap2M:    104013824 kB

DirectMap1G:    18874368 kB

 

root@hostname cat /proc/cpuinfo | less

core id         : 1

cpu cores       : 32

apicid          : 1

initial apicid  : 1

fpu             : yes

fpu_exception   : yes

cpuid level     : 13

wp              : yes

flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts nopl xtopology tsc_reliable nonstop_tsc eagerfpu pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 invpcid rtm rdseed adx smap xsaveopt arat

bugs            :

bogomips        : 4199.99

clflush size    : 64

cache_alignment : 64

address sizes   : 43 bits physical, 48 bits virtual

power management:

 

[...]

 

processor       : 63

vendor_id       : GenuineIntel

cpu family      : 6

model           : 79

model name      : Intel(R) Xeon(R) CPU E5-2683 v4 @ 2.10GHz

stepping        : 1

microcode       : 0xb000021

cpu MHz         : 2099.078

cache size      : 40960 KB

physical id     : 1

siblings        : 32

core id         : 31

cpu cores       : 32

apicid          : 63

initial apicid  : 63

fpu             : yes

fpu_exception   : yes

cpuid level     : 13

wp              : yes

flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts nopl xtopology tsc_reliable nonstop_tsc eagerfpu pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 invpcid rtm rdseed adx smap xsaveopt arat

bugs            :

bogomips        : 4199.99

clflush size    : 64

cache_alignment : 64

address sizes   : 43 bits physical, 48 bits virtual

power management:

 

Attachment: messages
Description: messages


Reply to: