On Wed, 2011-12-28 at 01:49 +0100, Josip Rodin wrote:
> This clock jump by 2999 seconds also happened here, so per:
> we switched to clocksource=pit in /etc/default/grub's $GRUB_CMDLINE_XEN on
> the dom0. This seemed to have avoided the problem, but since then, the clock
> jumps started happening like this:
> Dec 21 19:42:23 dom0machine kernel: [6034768.658836] Clocksource tsc unstable (delta = -811538856601 ns)
> In addition, now I checked what the said machine thinks is its clocksource:
> % cat /sys/devices/system/clocksource/clocksource0/current_clocksource /sys/devices/system/clocksource/clocksource0/available_clocksource
> So there's neither pit nor tsc in the available list :)
A PV kernel will (or should) always use "xen" as it's clocksource. This
is a PV timesource based around the TSC + correction factors (to account
for drift and PCPU migration).
The clocksource=pit on the hypervisor command line controls the
hypervisor's own timesource and not the dom0 kernels. I'm not sure how
you query the hypervisor for its timesource but I guess it'll be in "xl
dmesg" somewhere ("Platform timer is ...").
The message you quote above says *tsc* unstable. Prior to that was the
system actually using the tsc clocksource? It really shouldn't have
been... Before that message did available_clocksource contain TSC? What
about current_clocksource? ("Before" here ~= on a freshly booted system)
What are your exact hypervisor and kernel command lines? Other than
clocksource=pit are you overriding anything else in this regard?
Can you press the 's' hypervisor debug key and report the resulting text
from dmesg. (press a debug key == "xl debug-key s" + "xl dmesg" or press
Ctrl-A 3 times on serial then press 's').
It seems odd that the only reports we see of this issue is with Debian
Squeeze. It's possible that the snapshot of pvops which made it into
squeeze had some issue but I've just looked over the diff between that
and the current xen 2.6.32 pvops kernel and don't see anything obviously
time related. Perhaps this is a bug in Xen 4.0.x rather than the kernel?
If someone who can reproduce could try (separately) a new kernel and new
hypervisor that might help narrow it down.
Another option instead of clocksource= might be to try tsc=[unstable|
skewed]. Quoth the comment:
* tsc=unstable: Override all tests; assume TSC is unreliable.
* tsc=skewed: Assume TSCs are individually reliable, but skewed across CPUs.
Current Noise: Today Is The Day - Pain Is A Warning
A good marriage would be between a blind wife and deaf husband.
-- Michel de Montaigne