Help understanding NTP behaviour
Hello list,
I'm not sure if this is necessarily Debian-specific (could be, I don't know of any implementation differences) but I'm hoping someone here has been in the same boat before.
One of our wireless subscriber networks has a bunch of Canopy gear from Motorola. In a firmware update a year or so ago they added the ability for their CMMs (GPS receiver/switches/POE injectors all in one) into nice GPS-powered NTP servers. The exact performance of these I am not sure of.
I got it into my head that it would be nice to serve time to our network from our own little network of GPS receivers. I've built two Debian servers for this function, but there are problems. Since these devices are wireless, I understand that there will be jitter. What I don't get is the offset varying so much! Frequently you can check in on any of the two servers and see wild numbers for the GPS receivers. I restarted ntpd on both hosts a few minutes ago so the numbers will be all crazy for a while. What confuses me is the offsets. Other times I will go and check in on these and it will be random which ones are within a few ms, which are +1000 and which are -1000:
TIME-SRV-A:/etc# ntpq -pn
remote refid st t when poll reach delay offset jitter
==============================================================================
172.19.10.36 .STEP. 16 u 112 128 0 0.000 0.000 0.000
172.20.8.10 .GPS. 1 u 20 64 1 6.485 -0.335 0.000
172.20.8.20 .GPS. 1 u 20 64 1 6.445 -0.331 0.431
172.20.8.30 .GPS. 1 u 32 64 1 5.190 0.313 0.000
172.20.12.5 .GPS. 1 u 20 64 1 6.900 -0.569 0.579
172.20.12.10 .GPS. 1 u 19 64 1 5.578 0.120 0.095
172.20.12.20 .GPS. 1 u 18 64 1 3.156 -998.66 1000.07
172.20.12.30 .GPS. 1 u 17 64 1 14.958 -4.571 14.381
172.20.12.35 .GPS. 1 u 16 64 1 6.506 -0.339 0.005
172.20.12.40 .GPS. 1 u 1 64 1 3.634 1.071 999.984
172.20.12.50 .GPS. 1 u 1 64 1 6.595 -0.393 1000.20
172.20.16.10 .GPS. 1 u 2 64 1 7.462 -0.833 1.190
216.234.161.11 69.25.96.13 2 u 14 64 1 53.982 222.394 1.554
216.194.70.2 132.163.4.103 2 u - 64 1 51.324 220.696 5.650
67.212.74.220 64.90.182.55 2 u 7 64 1 34.935 231.701 7.908
74.3.161.36 140.142.16.34 2 u - 64 1 70.567 239.736 5.535
127.127.1.0 .LOCL. 10 l - 64 0 0.000 0.000 0.000
TIME-SRV-B:/etc# ntpq -pn
remote refid st t when poll reach delay offset jitter
==============================================================================
172.19.10.4 172.20.8.30 2 u 35 64 0 0.000 0.000 0.000
+172.20.8.10 .GPS. 1 u 5 64 7 6.670 0.746 655.461
*172.20.8.20 .GPS. 1 u 5 64 7 6.666 0.737 378.067
+172.20.8.30 .GPS. 1 u 2 64 7 5.411 1.414 534.611
x172.20.12.5 .GPS. 1 u 65 64 7 6.588 -999.19 755.154
x172.20.12.10 .GPS. 1 u 65 64 7 7.054 -999.40 756.159
+172.20.12.20 .GPS. 1 u 63 64 7 3.601 2.290 654.686
+172.20.12.30 .GPS. 1 u 63 64 7 8.712 -0.287 534.525
x172.20.12.35 .GPS. 1 u 39 64 37 6.475 -999.13 925.761
x172.20.12.40 .GPS. 1 u 39 64 37 4.294 -998.05 845.099
+172.20.12.50 .GPS. 1 u 47 64 17 10.292 -1.035 654.064
+172.20.16.10 .GPS. 1 u 50 64 17 6.817 0.703 654.670
-205.189.158.228 209.87.233.53 3 u 39 64 37 28.000 56.084 31.056
-184.107.229.26 209.51.161.238 2 u 49 64 17 27.789 69.598 30.689
-208.69.56.110 209.51.161.238 2 u 51 64 17 34.166 64.142 30.792
#199.85.124.148 209.87.233.53 3 u 36 64 37 36.244 67.480 31.339
127.127.1.0 .LOCL. 10 l 391 64 100 0.000 0.000 0.000
The two boxes were originally identical, but in troubleshooting I've changed some settings to no avail:
TIME-SRV-A:
IBM HS20 blade
ACPI timing only
P4-based Xeon, 2 GB Reg. ECC DDR2, flat configuration
LSI SAS hardware RAID1
Debian 6.0.3, Linux 2.6.32-5-amd64 #1 SMP Mon Oct 3 03:59:20 UTC 2011 x86_64 GNU/Linux
Dual bnx2 NICs (tg3 driver?), only single Gbit Ethernet enabled
*** NTP.CONF FROM TIME-SRV-A ***
driftfile /var/lib/ntp/ntp.drift
statistics loopstats peerstats clockstats
filegen loopstats file loopstats type day enable
filegen peerstats file peerstats type day enable
filegen clockstats file clockstats type day enable
keys /etc/ntp.keys
trustedkey 9
peer TIME-SRV-B key 9 iburst
# CMMs in Wireless Land
server 172.20.8.10 iburst
server 172.20.8.20 iburst
server 172.20.8.30 iburst
server 172.20.12.5 iburst
server 172.20.12.10 iburst
server 172.20.12.20 iburst
server 172.20.12.30 iburst
server 172.20.12.35 iburst
server 172.20.12.40 iburst
server 172.20.12.50 iburst
server 172.20.16.10 iburst
# Regular NTP servers for backup
server 0.debian.pool.ntp.org iburst
server 1.debian.pool.ntp.org iburst
server 2.debian.pool.ntp.org iburst
server 3.debian.pool.ntp.org iburst
# Local backup clock
server 127.127.1.0
fudge 127.127.1.0 stratum 10
restrict -4 default kod notrap nomodify nopeer noquery
restrict -6 default kod notrap nomodify nopeer noquery
restrict 127.0.0.1
restrict 172.16.0.0 mask 255.240.0.0 nomodify notrap
restrict 192.168.0.0 mask 255.255.0.0 nomodify notrap
TIME-SRV-B:
IBM HS21 blade
HPET enabled, and same BIOS options we use for ESXi hosts
Xeon 5140, 4 GB Reg. ECC DDR2, "sparing" config so only 2 GB usable
LSI SAS hardware RAID1
Debian 6.0.4, Linux 2.6.32-5-amd64 #1 SMP Sat May 5 01:12:59 UTC 2012 x86_64 GNU/Linux
Dual bnx2 NICs (tg3 driver?), LACP bonding enabled
Follow suggestions at http://www.math.ucla.edu/~jimc/documents/bugfix/12-ntp-wont-sync.html
*** NTP.CONF FROM TIME-SRV-B ***
driftfile /var/lib/ntp/ntp.drift
statistics loopstats peerstats clockstats
filegen loopstats file loopstats type day enable
filegen peerstats file peerstats type day enable
filegen clockstats file clockstats type day enable
keys /etc/ntp.keys
trustedkey 9
peer TIME-SRV-A key 9 iburst
# CMMs in Wireless Land
server 172.20.8.10 iburst
server 172.20.8.20 iburst
server 172.20.8.30 iburst
server 172.20.12.5 iburst
server 172.20.12.10 iburst
server 172.20.12.20 iburst
server 172.20.12.30 iburst
server 172.20.12.35 iburst
server 172.20.12.40 iburst
server 172.20.12.50 iburst
server 172.20.16.10 iburst
# Regular NTP servers for backup
server 0.debian.pool.ntp.org iburst
server 1.debian.pool.ntp.org iburst
server 2.debian.pool.ntp.org iburst
server 3.debian.pool.ntp.org iburst
# Local backup clock
server 127.127.1.0
fudge 127.127.1.0 stratum 10
restrict -4 default kod notrap nomodify nopeer noquery
restrict -6 default kod notrap nomodify nopeer noquery
restrict 127.0.0.1
restrict 172.16.0.0 mask 255.240.0.0 nomodify notrap
restrict 192.168.0.0 mask 255.255.0.0 nomodify notrap
I can't find any network-related explanation for this. The GPS units have different wireless backhaul on different frequencies off of a fiber-fed system. Our network is running MPLS and I don't see any congestion or load balancing that could explain something like this.
Any suggestions or advice would be greatly appreciated!
Thanks
---
Ross Halliday
Network Operations
WTC Communications
Office: 613-547-6939 x203
Helpdesk: 866-547-6939 option 2
http://www.wtccommunications.ca
Before I hit send, here are two more examples from TIME-SRV-B:
TIME-SRV-B:/etc# ntpq -pn
remote refid st t when poll reach delay offset jitter
==============================================================================
172.19.10.4 LOCAL(0) 11 u 38 64 1 5.717 -174.29 16.433
x172.20.8.10 .GPS. 1 u 46 64 177 6.779 -999.27 925.589
+172.20.8.20 .GPS. 1 u 48 64 177 10.386 -1.088 533.666
*172.20.8.30 .GPS. 1 u 40 64 177 5.412 1.420 378.064
x172.20.12.5 .GPS. 1 u 45 64 377 9.923 -1000.8 926.336
+172.20.12.10 .GPS. 1 u 46 64 377 6.531 0.841 534.398
+172.20.12.20 .GPS. 1 u 42 64 377 3.479 2.380 534.558
x172.20.12.30 .GPS. 1 u 37 64 377 6.826 -999.31 911.511
x172.20.12.35 .GPS. 1 u 16 64 377 5.750 -998.79 844.180
+172.20.12.40 .GPS. 1 u 15 64 377 5.676 1.247 654.208
+172.20.12.50 .GPS. 1 u 25 64 377 6.731 0.747 534.527
+172.20.16.10 .GPS. 1 u 28 64 377 9.888 -0.849 653.950
-205.189.158.228 209.87.233.53 3 u 12 64 377 28.576 95.607 35.210
-184.107.229.26 209.51.161.238 2 u 23 64 377 27.660 110.014 36.308
-208.69.56.110 209.51.161.238 2 u 24 64 377 35.119 104.465 35.787
#199.85.124.148 209.87.233.53 3 u 10 64 377 36.519 107.890 36.206
127.127.1.0 .LOCL. 10 l 703 64 0 0.000 0.000 0.000
TIME-SRV-B:/etc# ntpq -pn
remote refid st t when poll reach delay offset jitter
==============================================================================
#172.19.10.4 LOCAL(0) 11 u 49 64 6 5.729 -189.66 25.368
+172.20.8.10 .GPS. 1 u 50 64 377 8.411 -0.079 534.256
+172.20.8.20 .GPS. 1 u 49 64 377 8.182 0.014 377.755
+172.20.8.30 .GPS. 1 u 43 64 377 5.020 1.607 534.943
+172.20.12.5 .GPS. 1 u 47 64 377 6.364 0.913 535.021
+172.20.12.10 .GPS. 1 u 48 64 377 5.307 1.454 534.725
x172.20.12.20 .GPS. 1 u 42 64 377 3.599 -997.68 925.621
*172.20.12.30 .GPS. 1 u 38 64 377 5.414 1.399 378.568
+172.20.12.35 .GPS. 1 u 17 64 377 5.545 1.343 654.863
+172.20.12.40 .GPS. 1 u 17 64 377 4.283 1.965 534.544
x172.20.12.50 .GPS. 1 u 28 64 377 5.678 -998.73 844.263
+172.20.16.10 .GPS. 1 u 30 64 377 6.112 1.048 378.634
#205.189.158.228 209.87.233.53 3 u 16 64 377 27.839 111.925 35.568
#184.107.229.26 209.51.161.238 2 u 26 64 377 27.297 125.874 36.176
-208.69.56.110 209.51.161.238 2 u 27 64 377 34.639 120.508 35.971
#199.85.124.148 209.87.233.53 3 u 12 64 377 36.132 123.903 36.155
127.127.1.0 .LOCL. 10 l 840 64 0 0.000 0.000 0.000
Reply to: