On 2010-04-06, micah anderson wrote: > On Thu, Apr 01, 2010 at 03:40:40PM -0400, Micah Anderson wrote: > > "Nikita V. Youshchenko" <yoush@debian.org> writes: > > > > >> We have had to carry that patch without any upstream support (or sharing > > >> with Novell, which eventually released SLES 11 with 2.6.27). As a > > >> result, the xen-flavour kernels for lenny are very buggy, particularly > > >> for domains with multiple vCPUs (though that *may* be fixed now). > > > > > > Unfortunately it is not fixed. > > > > > > We here once migrated to xen and now rely on it, and that gives lots of > > > frustration. For any loaded domain we still have to run etch kernel, > > > because lenny kernel constantly crashes after several days of heavy load. > > > Dom0's run lenny kernel - and with a fix for #542250 they don't crash, but > > > those are almost unloaded. > > > > I was having problems with multiple vCPUs also, under moderate load I > > would regularly get crashes. I reported my findings in #504805. I > > swapped out machines, didn't work. When the fix for the xen_spin_wait() > > came out, I eagerly switched to that, but it didn't fix my problem. I > > even tried my hardest to switch to the latest upstream Xen kernel to see > > if that would fix things, but it was way too unstable and I couldn't get > > it to work at all. > > > > What do you exactly mean with the 'upstream Xen kernel' ? The latest xen-kernel, not from Debian. There are a number of them, the Novell/OpenSuse forward-port of the old-style xenlinux patches, and the pvops dom0 git tree. The pvops one was the one I was trying to get working since the other versions of the kernel aren't receiving any attention and all dev is happening in the pvops. > > Eventually I stumbled on a way to keep my machines from restarting, its > > not a great solution, but it stops me from having to deal with the > > failure on a daily basis. I think that anyone else who is having this > > problem can do this and it will work. Obviously this is not the right > > solution, but it works until we can get a fix. > > > > First I made sure this was set: > > > > /etc/xen/xend-config.sxp: (dom0-cpus 0) > > > > Then I pinned individual physical CPUs to specific domU's, once pinned, > > the problem stops. > > > > vcpu pinning is not required for a properly working kernel.. It shouldn't be, I agree... but it seems like it is required to keep the kernel from a daily panic. m
Attachment:
pgpHK2H7CZ4vR.pgp
Description: PGP signature