[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#592428: Fix 2.6.32 XEN guest on old buggy RHEL5/EC2 hypervisor(XSAVE)



On Aug 11, 2010, at 10:55, Jeremy Fitzhardinge wrote:
 On 08/11/2010 01:53 AM, Ian Campbell wrote:
>> On Wed, 2010-08-11 at 03:31 +0100, Ben Hutchings wrote:
>>> On Mon, 2010-08-09 at 19:29 -0400, Kyle Moffett wrote:
>>>> Would it be possible to apply the attached Fedora/Ubuntu kernel patch
>>>> to Debian as well?  The Fedora link is:
>>>> http://cvs.fedoraproject.org/viewvc/F-13/kernel/fix_xen_guest_on_old_EC2.patch
>>>> 
>>>> And the Ubuntu link:
>>>> http://kernel.ubuntu.com/git?p=rtg/ubuntu-maverick.git;a=commit;h=1a30f99
>>>> 
>>>> As far as I can tell, no released version of Xen currently supports
>>>> XSAVE, so this change is effectively a NOP on all newer hypervisors, but
>>>> it allows functionality on older hypervisors (such as RHEL5, or when
>>>> running on Amazon's EC2 service).
>>> [...]
>>> 
>>> The comment says that 'There is only potential for guest performance
>>> loss on upstream Xen' which implies that XSAVE is supported now.

I spent some time searching, and I can't find any reference to XSAVE support in upstream Xen.  There are some email threads which discuss potential patches, but all the comments seem to indicate that all of the proposed methods for supporting XSAVE fail catastrophically during instance migration.


>>> Ian, what's your take on this?  Is it worth trying to use XSAVE, and if
>>> so is there a way to detect the broken HV versions before doing so?
>> 
>> The following commit seems to be in v2.6.31-rc1, is it not sufficient to
>> allow correct operation on these older hypervisors? If not it would be
>> nice to know why.
> 
> The patch referred to by those two links says that old versions of Xen 
> will simply kill the domain if they try to set CR4 bits the hypervisor 
> doesn't understand, so this patch will not work.
> 
>>     xen: mask XSAVE from cpuid
>> 
>>     Xen leaves XSAVE set in cpuid, but doesn't allow cr4.OSXSAVE
>>     to be set.  This confuses the kernel and it ends up crashing on
>>     an xsetbv instruction.

I directly tested the Debian 2.6.32-5-amd64 pvops kernel on the Amazon EC2 service (which uses one of the old buggy hypervisors).  When I used the unmodified Debian kernel (which includes the "xen: mask XSAVE from cpuid" patch, my instance reboots before logging any output.  When I use the same kernel patched with "fix_xen_guest_on_old_EC2.patch", it correctly boots and runs.


>> The kernel can take a "noxsave" on the command line which I imagine
>> would also workaround the issue.

Tried this too, it does not help.  It would have made my life a lot easier if it did.


>> If the hypervisor is old-but-not-too-old you may also have the option of
>> masking the xsave bit in cpuid via the domain config file.

Unfortunately many virtual hosting platforms don't give you the option of messing with the domain config file. :-(

Thanks for all your help!

Cheers,
Kyle Moffett





Reply to: