[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#689861: Issues with Xen when all CPUs are available to dom0



Hello Ian,

On 10/08/2012 05:31 PM, Ian Campbell wrote:
I'm afraid I don't have any particularly dazzling insights here. One
thing you could try is asking on the upstream xen-users@ list in case
someone else has seen this, although it doesn't ring any bells for me.

Another experiment might be to try the wheezy hypervisor and/or kernel
packages.

The stolen time thing is weird, since that is time spent where the VCPU
could run but is not because another VCPU is scheduled -- but if you
can't start any guests then there is nothing to compete against. It
might be interesting to investigate a little where all the CPU time is
going, firstly using top to check for rogue processes in dom0 and then
xentop to look for rogue VCPUs. Pressing 'd' on the xen debug console a
few time ("statistical sampling") might give also give a clue where the
physical CPUs are spending all of there time.

How many physical CPUs do you have?


Hang on, this shows:
         server1:~# xm vcpu-list 0
         Name                                ID  VCPU   CPU State   Time(s) CPU
         Affinity
         Domain-0                             0     0     0   r--    1568.2 0
         Domain-0                             0     1     -   --p     129.3 0
         Domain-0                             0     2     -   --p     132.1 0
         Domain-0                             0     3     -   --p     134.8 0
This is just after I booted dom0 with limit to one CPU.
IOW you have 4 dom0 VCPUs but they are all constrained to run on
physical CPU0 --that would lead precisely to loads of stolen time!

What pinning options are you using to achieve this? It might be useful
to provide you full command lines (both h/v and kernel) and config files
etc. A boot log wouldn't go amiss either.

Contrast with my system here:
         root@calder:~# xm vcpu-list
         Name                                ID  VCPU   CPU State   Time(s) CPU Affinity
         Domain-0                             0     0     0   -b-    1628.5 any cpu
         Domain-0                             0     1     1   r--    1539.1 any cpu

Here you see that my 2 dom0 vcpus are free to run on any pVCPU. Even
with pinning I would expect VCPU0->PCPU0 and VCPU1->PCPU1.

Ian.

These are outputs showing the situation:

top - 00:48:28 up 4 min,  1 user,  load average: 3.97, 1.66, 0.62

Tasks: 257 total,  12 running, 241 sleeping,   0 stopped,   4 zombie

Cpu0  :  0.0%us,  1.5%sy,  0.0%ni, 24.9%id,  0.0%wa,  0.0%hi,  0.0%si, 73.6%st

Cpu1  :  0.0%us,  0.7%sy,  0.0%ni, 23.3%id,  0.5%wa,  0.0%hi,  0.0%si, 75.6%st

Cpu2  :  0.3%us,  4.8%sy,  0.0%ni,  8.1%id,  0.0%wa,  0.0%hi,  0.0%si, 86.7%st

Cpu3  :  0.0%us,  0.4%sy,  0.0%ni, 21.7%id,  0.0%wa,  0.0%hi,  0.4%si, 77.4%st

Cpu4  :  0.7%us,  1.0%sy,  0.0%ni,  1.3%id,  0.0%wa,  0.0%hi,  0.3%si, 96.7%st

Cpu5  :  0.4%us,  2.8%sy,  0.0%ni,  1.1%id,  0.0%wa,  0.0%hi,  0.0%si, 95.8%st

Mem:    765788k total,   360872k used,   404916k free,    59444k buffers

Swap:   974840k total,        0k used,   974840k free,    49796k cached


server2:~# xm vcpu-list

Name                                ID  VCPU   CPU State   Time(s) CPU Affinity

Domain-0                             0     0     0   -b-      80.6 0

Domain-0                             0     1     0   ---      78.0 0

Domain-0                             0     2     0   -b-      79.5 0

Domain-0                             0     3     0   -b-      78.6 0

Domain-0                             0     4     0   ---      77.6 0

Domain-0                             0     5     0   ---      79.5 0


The output of 'xm vcpu-list' took approx. 5 minutes to finish. I just realized there is that 'wrong' CPU affinity you just mentioned.
The system was booted with this configuration:

grub.cfg

        multiboot       /xen-4.0-amd64.gz placeholder dom0_mem=756M acpi=on numa=on console=tty0 sync_console console_to_ring com2=11520,8n1 console=com2

        module  /vmlinuz-2.6.32-5-xen-amd64 placeholder root=/dev/mapper/system_xen-root ro root=/dev/mapper/system_xen-root ro quiet console=hvc0 earlyprintk=xen nomodeset

xend-config.sxp

       (dom0-cpus 0)


If I set value of dom0-cpus to '1' - all the vcpus except the first one are in 'paused' state as shown before:

server1:~# xm vcpu-list 0

Name                                ID  VCPU   CPU State   Time(s) CPU

Affinity

Domain-0                             0     0     0   r--    1568.2 0

Domain-0                             0     1     -   --p     129.3 0

Domain-0                             0     2     -   --p     132.1 0

Domain-0                             0     3     -   --p     134.8 0


I have mostly default Xen config. One of the affected systems is single CPU and the second one dual CPU with only one processor chipset installed. I just found that I put this into rc.local (with modification date of 13 July 2008):

xm vcpu-pin 0 all 0

This explains the CPU Affinity and where the issue is coming from.
Anyway this was working before (don't know when exactly this issue raised). Could it be related to move to pv_ops kernels?

Best regards,
--
Peter Viskup


Reply to: