[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#504805: Work around



Eventually I stumbled on a way to keep my machines from restarting, its
not a great solution, but it stops me from having to deal with the
failure on a daily basis. I think that anyone else who is having this
problem can do this and it will work. Obviously this is not the right
solution, but it works until we can get a fix.

First I made sure this was set:

/etc/xen/xend-config.sxp: (dom0-cpus 0)

Then I pinned individual physical CPUs to specific domU's, once pinned,
the problem stops.

What does that mean? Well, Xen does this wacky thing where it creates
virtual CPUs (VCPUs), each domU has one of them by default (but you can
have more), and then it moves physical CPUs between those VCPUs
depending on need.


So lets say you have four CPUs, and a domU. That domU has one VCPU by
default. That VCPU could actually have the physical CPU 0, 1, 2, 3 all
servicing it to provide that VCPU, even at the same time. I found
somewhere that this can be a performance hit, because it needs to figure
out how to deal with this and switch contexts. I also read that it could
cause some instability (!), so pinning the physical CPUs so they don't
move around seemed to solve this.

The pinning does not stick across reboots, so it has to be done again if
the system is rebooted, and it isn't really possible to set this in a
startup script, at least I don't think so.

So how do you do this? If you look at 'xm vcpu-list' (which annoyingly
isn't listed in 'xm help') you will see the CPU column populated with a
random CPU, depending on scheduling, and then the CPU Affinity column
all say 'any cpu'. This means that any physical CPU could travel between
them, and would, depending on the scheduling. Once you pin things, then
the individual domU's are set to have specific CPU affinities, so the
CPUs don't 'travel' between them, and magically the crash stops.

So an example:

root@shoveler:~# xm vcpu-list
Name                ID  VCPU   CPU State   Time(s) CPU Affinity
Domain-0             0     0     1   -b-  283688.8 any cpu
Domain-0             0     1     1   ---   39666.3 any cpu
Domain-0             0     2     1   r--   49224.4 any cpu
Domain-0             0     3     1   -b-   75591.1 any cpu
kite                 1     0     3   -b-   71411.8 any cpu
murrelet             2     0     0   -b-  472222.2 any cpu
test                 3     0     0   r--  342182.3 any cpu

So we want to fix that final column using 'xm vcpu-pin' (also a command
not listed in 'xm help'):

Usage: xm vcpu-pin <Domain> <VCPU|all> <CPUs|all>

Set which CPUs a VCPU can use.

root@shoveler:~# xm vcpu-pin 0 0 0
root@shoveler:~# xm vcpu-pin 0 1 0
root@shoveler:~# xm vcpu-pin 0 2 0
root@shoveler:~# xm vcpu-pin 0 3 0
root@shoveler:~# xm vcpu-pin 1 0 1
root@shoveler:~# xm vcpu-pin 2 0 2
root@shoveler:~# xm vcpu-pin 3 0 3

root@shoveler:~# xm vcpu-list                                                   
Name                 ID  VCPU   CPU State   Time(s) CPU Affinity
Domain-0              0     0     1   -b-  283700.3 0
Domain-0              0     1     1   r--   39669.6 0
Domain-0              0     2     1   -b-   49227.4 0
Domain-0              0     3     1   -b-   75596.2 0
kite                  1     0     3   -b-   71415.3 1
murrelet              2     0     0   -b-  472237.8 2
test                  3     0     0   r--  342182.3 3


And voila, no more crashes... :P

micah



Reply to: