[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#696735: pu: package xen/4.0.1-5.5 -> -5.6 (fix for Xen clock bug: #599161)



Package: release.debian.org
Severity: normal
User: release.debian.org@packages.debian.org
Usertags: pu

I have prepared an update for Xen for Stable, with the previous agreement
of Guido, who takes care of Xen updates in Debian Stable (Waldi doesn't
seem to care about Debian stable). The debdiff is here:
http://archive.gplhost.com/debian/pool/squeeze/main/x/xen/xen_4.0.1-5.6.debdiff

This fixes #599161 which is a 2 years lasting entry in our BTS.

I also attached it to this mail for a quicker reference. All built binaries
and sources are also available on this Debian (source) repository.

Let me explain shortly what the problem is (or was, since I have the fix).
If you don't care, or have no time, you may skip the explanations, which
aren't that important (after all, the fix is only 3 lines of ASM...).

--- comments start ---
Root of the problem:
My understanding is that due to compiler optimization the assembly code
that was inlined in Xen was wrong in the case of a double call, leading
to the Xen guests having an offset with the clock in the Xen dom0. See
patch descroption for more info.

Not working solution:
Setting-up the ntp daemon in the domU is unfortunately useless in Squeeze,
because the only available clock source is "Xen" and there is no support
for "independent wallclock". So, after the ntp daemon starts, it may
simply crash, and sooner or later, the domU clock gets back to its
original offset (which wasn't really predictable from server to server,
but seemed to be consistent after rebooting dom0 and domU).

Rebooting a Xen server dom0 and all domUs doesn't fix it either.
Surprisingly, all domUs get back to their original clock offsets.

Tests I did:
I have installed the built binaries in my test server (the one which
hosts the guest OS on which I was always uploading to SID, with the
clock being 10 minutes early in that domU...), and as far as I can
tell, the issue is gone on this server. I haven't upgraded all of
GPLHost servers with this patch, but so far, it's working well, and
also fixes the issues in various servers which had the problem.

Finaly:
So, at the end, the only way to fix this (very long lasting) but is to
apply the upstream patch which shows in the debdiff attached to this bug.

As you may imagine, having a correct "virtualized hardware clock source"
is overly important for any Xen user. So I believe this patch is very
important.
--- comments end ---

Please let me know if you accept that I upload this into
squeeze-proposed-updates.

Cheers,

Thomas Goirand (zigo)
diff -Nru xen-4.0.1/debian/changelog xen-4.0.1/debian/changelog
--- xen-4.0.1/debian/changelog	2012-12-06 15:50:48.000000000 +0000
+++ xen-4.0.1/debian/changelog	2012-12-26 13:49:06.000000000 +0000
@@ -1,3 +1,13 @@
+xen (4.0.1-5.6) stable-proposed-updates; urgency=low
+
+  * Non-maintainer upload, previously discussed with Guido.
+  * Fixes Xen clock long standing issue, eg: fix scale_delta() inline assembly,
+  causing domU offset and possibly leading to crashes (Closes: #599161). Thanks
+  to Ian Campbell <ijc@hellion.org.uk> for forwarding the patch to the Debian
+  BTS, and Jan Beulich <jbeulich@suse.com> for working on an upstream patch.
+
+ -- Thomas Goirand <zigo@debian.org>  Wed, 26 Dec 2012 13:18:34 +0000
+
 xen (4.0.1-5.5) stable-security; urgency=high
 
   * Apply fix for Xen Security Advisory 5 (CVE-2011-3131)
diff -Nru xen-4.0.1/debian/control.md5sum xen-4.0.1/debian/control.md5sum
--- xen-4.0.1/debian/control.md5sum	2012-12-06 15:54:45.000000000 +0000
+++ xen-4.0.1/debian/control.md5sum	2012-12-26 13:50:53.000000000 +0000
@@ -1,4 +1,4 @@
-468e1c871ad35052319caa1f5d159124  debian/changelog
+ec687758337647fba126272a85e6ab09  debian/changelog
 24f2598a23e30264aea4a983d5d19eec  debian/bin/gencontrol.py
 ee1ccd7bf0932a81ca221cab08347614  debian/templates/control.hypervisor.in
 e4335ab10e217a12328cdf123473ed37  debian/templates/control.main.in
diff -Nru xen-4.0.1/debian/patches/series xen-4.0.1/debian/patches/series
--- xen-4.0.1/debian/patches/series	2012-12-06 15:47:19.000000000 +0000
+++ xen-4.0.1/debian/patches/series	2012-12-26 13:26:09.000000000 +0000
@@ -87,3 +87,5 @@
 CVE-2012-5513
 CVE-2012-5514
 CVE-2012-5515
+
+x86-time-scale-asm.patch
diff -Nru xen-4.0.1/debian/patches/x86-time-scale-asm.patch xen-4.0.1/debian/patches/x86-time-scale-asm.patch
--- xen-4.0.1/debian/patches/x86-time-scale-asm.patch	1970-01-01 00:00:00.000000000 +0000
+++ xen-4.0.1/debian/patches/x86-time-scale-asm.patch	2012-12-26 13:49:25.000000000 +0000
@@ -0,0 +1,27 @@
+Desctiption: fix scale_delta() inline assembly clock problem
+ The way it was coded, it clobbered %rdx without telling the compiler.
+ This generally didn't cause any problems except when there are two back
+ to back invocations (as in plt_overflow()), as in that case the
+ compiler may validly assume that it can re-use for the second instance
+ the value loaded into %rdx before the first one.
+ .
+ Once at it, also properly relax the second operand of "mul" (there's no
+ need for it to be in %rdx, or a register at all), and switch away from
+ using explicit register names in the instruction operands.
+Author: Jan Beulich <jbeulich@suse.com>
+Origin: Upstream
+
+--- xen-4.0.1-5.5/xen/arch/x86/time.c	2012-12-26 13:24:42.000000000 +0000
++++ xen-4.0.1-5.6/xen/arch/x86/time.c	2012-12-26 13:25:10.000000000 +0000
+@@ -139,8 +139,9 @@
+         : "a" ((u32)delta), "1" ((u32)(delta >> 32)), "2" (scale->mul_frac) );
+ #else
+     asm (
+-        "mul %%rdx ; shrd $32,%%rdx,%%rax"
+-        : "=a" (product) : "0" (delta), "d" ((u64)scale->mul_frac) );
++        "mul %2 ; shrd $32,%1,%0"
++        : "=a" (product), "=d" (delta)
++        : "rm" (delta), "0" ((u64)scale->mul_frac) );
+ #endif
+ 
+     return product;

Reply to: