[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: unsure how to track down kernel stack traces in debian 9.2 on vmware ESXi



Tom Stocker wrote:

> Hello dear Debian folks
> 
> We run a Debian 9.2 build server on top of a vmware ESXi install on a
> quite powerful server (Dell Poweredge R730 with 2x Xeon E5-2683v4 (16
> cores per CPU makes 64 vCPUs with HT enabled). Bot installations are fully
> updated. Also Dell firmwares are up to date.
> 
> Now I do see stack traces in the Debian /var/log/messages (attached) file,
> but no time - corresponding entries in the underlying ESXi logs, so I tend
> to say it's a Debian (or a kernel) problem. The traces occur under heavy
> load and the server stops to respond.
> 
> Unfortunately we're evaluating vmware for this use-case so I cannot open a
> ticket there, as I'm running in eval mode. Its only one vm on this
> physical server. And no, it was not my idea to run it on vmware, I was
> told to do so.
> 
> I did run the open-vm-tools and tried with the vmware proprietary ones, no
> difference.
> 
> Linux hostname 4.9.0-4-amd64 #1 SMP Debian 4.9.51-1 (2017-09-28) x86_64
> GNU/Linux
> 
> Any ideas what I can do? Any help would be greately appreciated
> 

IMO what you see is cgroup management getting rid of overloading process,
which is normal in some extent. If you mean that the VM is not responding -
this needs investigation, or perhaps some tuning regarding cgroup behavior
or alike.

Nov 27 09:43:03 hostname kernel: [1184306.163531] Hardware name: VMware,
Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00
04/05/2016
Nov 27 09:43:03 hostname kernel: [1184306.163536] Workqueue: cgroup_destroy
css_free_work_fn


Perhaps you could try with 4.12 or 4.13 or find out what is overloading the
system.

look for example here https://patchwork.kernel.org/patch/9896303/

regards


Reply to: