[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

double-speed system clock

 I just noticed that on my Opteron cluster, the nodes that are running 64bit
kernels have their clocks ticking at double speed.  This happens with
Linux 2.4.26, and 2.4.27-pre2, compiled with gcc 3.3.3 (Debian 20040401) in a
Debian pure64 chroot. Linux 2.4.25, compiled on Debian Woody + bi-arch gcc
3.3.2 20030908, does _not_ have the problem.  The config options were pretty
much the same for all kernels, and all the kernels are plain vanilla flavour
from www.ca.kernel.org.

 If I run ntpdate to set the clock, then 10 seconds later it will be 10
seconds fast.  Running date(1), the system time advances 20 seconds in 10
seconds of real time.  (I haven't done anything weird with adjtimex(8).)
time sleep 10 takes 5 seconds, but bash reports its real time as 10 seconds.
The timer interrupt counter is increasing at a rate of 200/real second, so
it seems like the system is getting timer interrupts twice as fast as it
should.  (With 2.4.25, it is 100/sec, same as HZ).

 Linux says it is using the PIT and TSC timers.  I have HPET enabled in my
Linux config, but I guess Tyan's S2880 mobo doesn't have one.  This is a
dual-Opteron 240 machine, BTW.

 i386 Linux on the same machines has no problems with timekeeping.  (But I
haven't tested versions later than 2.4.25 in legacy mode.)

 I spent some time poking around the timer code that increments xtime, but I
guess the fact that the timer irqs are coming at double speed indicates that
the problem lies elsewhere.  Maybe the code that sets up the timer?

$ uname -a
Linux node6.cs.dal.ca 2.4.26 #2 SMP Fri May 14 14:46:42 ADT 2004 x86_64 x86_64 x86_64 GNU/Linux
$ cat /proc/interrupts
           CPU0       CPU1
  0:    4415908          0    IO-APIC-edge  timer
  1:          2          0    IO-APIC-edge  keyboard
  2:          0          0          XT-PIC  cascade
  9:          0          0   IO-APIC-level  acpi
 14:      17861          1    IO-APIC-edge  ide0
 19:          0          0   IO-APIC-level  usb-ohci, usb-ohci
 24:     563942          0   IO-APIC-level  eth0
 25:     564331          0   IO-APIC-level  eth1
NMI:      19097      19097
LOC:    2211090    2211095
ERR:          0
MIS:          0

 Only CPU0 is getting the timer interrupt, but at least we know it's not
that both CPUs are getting the timer interrupt.  (Both CPUs get 100 LOC:
(local APIC) interrupts/sec, but that happens on the non-buggy 2.4.25, too.)

 Thanks for any help,

#define X(x,y) x##y
Peter Cordes ;  e-mail: X(peter@cor , des.ca)

"The gods confound the man who first found out how to distinguish the hours!
 Confound him, too, who in this place set up a sundial, to cut and hack
 my day so wretchedly into small pieces!" -- Plautus, 200 BC

Attachment: signature.asc
Description: Digital signature

Reply to: