[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Unknown Server Failure, Logs and openntpd



On Fri, May 30, 2008 at 09:27:51AM +0300, Volkan YAZICI wrote:
> This morning one of our R&D servers stop responding (no ssh, http) and
> because of urgency of some tests I needed to hardware-reset it. After
> machine woke up, I first checked /var/log/messages:
> 
[snip most]
>   May 30 08:09:47 arge -- MARK --
>   May 30 08:29:47 arge -- MARK --
>   May 30 08:44:36 arge kernel: e100: eth1: e100_watchdog: link down
>   May 30 08:44:38 arge kernel: e100: eth1: e100_watchdog: link up, 100Mbps, full-duplex
>   May 30 08:44:42 arge kernel: e100: eth1: e100_watchdog: link up, 100Mbps, full-duplex

>   May 30 08:45:14 arge shutdown[7450]: shutting down for system halt
>   May 30 08:38:11 arge syslogd 1.4.1#18: restart.
> 
> As can be understood from "kernel: e100: eth1: ..." lines, I first
> suspected a connection failure and try to fiddle with the network cable
> socket. But logs tell that it wasn't the problem. Moreover, it seems
> that system was working properly just before 08:44:36 if we'd look at
> /var/log/syslog
>   
 
[snip]
> I checked logs of every file under /var/log at time between 08:00:00 and
> 08:38:00, but found nothing useful. OTOH, if we'd look at below lines of
> the /var/log/messages output:
> 
>   May 30 08:45:14 arge shutdown[7450]: shutting down for system halt
>   May 30 08:38:11 arge syslogd 1.4.1#18: restart.
> 
> It seems that openntpd somehow failed to synchronize hardware clock with
> the time it gathered from NTP servers, and after reboot it switched back
> to a past time. Is this something expected? If not, how can I fix this?
> 
> To summarize, what else should I check to figure out the reason of the
> emerged problem? (I'll try to login from terminal next time such a
> failure repeats.)

I don't know what caused the freeze;  The hard reset would keep the
shutdown scripts from setting the system time to the hardware clock.  On
restart, did the ntpd eventually get a network connection and fix the
time?

It may not have been a freeze at all, just a networking problem that
wasn't found by fitzing with the cable.  

Logging in from a VT or serial terminal would have been helpful.  If you
are concerned that this may happen again, you may even want to connect
up a serial console to another box (or a real serial VT) and watch that
as well.

Doug.


Reply to: