Bug#625217: xen-linux-system-2.6.32-5-xen-amd64: Heavy load on domU causes dom0 to run out of memory

Hi Sebastian,

Thanks for your report.

On Mon, 2011-05-02 at 16:50 +0200, Sebastian Hofmann wrote:
> Package: xen-linux-system-2.6.32-5-xen-amd64
> Version: 2.6.32-31
> Severity: critical
> Justification: breaks the whole system
> Hi,
> I have 64 bit xen kernel from squeeze installed on a dual xeon
> maschine. Usually everything runs fine until it comes to heavy load on
> a domU with high I/O and memory consumption.
> This causes the dom0 to run out of memory and to kill several
> processes (see log below). As a consequence of this, the whole system
> becomes unusable.
> I tried several things like assign dedicated memory to dom0, disable
> balloning, increase scheduler domain weights and assigned dedicated
> CPUs to dom0 as described in 
> http://wiki.xensource.com/xenwiki/XenBestPractices but had no success.
> I think a domU should never break the whole system, so this might be a
> bug. Please let me know if you need further information.
> Thanks
> Sebastian
> May  2 16:05:26 hercules kernel: [ 1768.319877] nrpe invoked oom-killer: gfp_mask=0x200da, order=0, oom_adj=0
> May  2 16:05:26 hercules kernel: [ 1768.319883] nrpe cpuset=/ mems_allowed=0
> May  2 16:05:26 hercules kernel: [ 1768.319886] Pid: 2118, comm: nrpe Not tainted 2.6.32-5-xen-amd64 #1

Am I right that nrpe is part of nagios? (it's probably just the unluck
process so tells us nothing really)

What sort of load are the domUs experiencing? i.e. CPU, network, disk

What does your storage stack look like? (are you using LVM, iSCSI, DRDB,
SW RAID, filesystems etc).

Are you running anything interesting in domain 0 other than the Xen
toolstack, nagios, sshd etc?

What does /proc/meminfo look like after a fresh boot?

Lastly, please can you provide a dmesg log of the initial bootup.


