[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Diagnosing occassional random reboots



On Tue, Oct 31, 2006 at 05:29:29PM +0000, Dougie Nisbet wrote:
> A server which has been running steadily for years is beginning to 
> reboot. To the best of my knowledge, nothing has changed. It is a 
> dual-processor PIII. It runs stable.
> 
> It is tucked away in the loft and usually has no monitor attached so 
> tracking this down is difficult. However even if I brought it into a 
> more convenient area, short of sitting staring at the screen waiting for 
>  a crash or reboot, I'm not sure it would help much.
> 
> I've tried rebuilding a newer kernel from backports.org. And trimmed it 
> right down as much as possible. There is nothing useful in syslog. A 
> typical series of reboots looks  like:
> 
> dougie   pts/0        tbird2xp:0.0     Tue Oct 31 17:15   still logged in
> runlevel (to lvl 2)   2.6.17           Tue Oct 31 17:12 - 17:21  (00:08)
> reboot   system boot  2.6.17           Tue Oct 31 17:12          (00:08)
> dougie   pts/0        tbird2xp:0.0     Tue Oct 31 17:09 - crash  (00:02)
> runlevel (to lvl 2)   2.6.17           Tue Oct 31 16:59 - 17:12  (00:12)
> reboot   system boot  2.6.17           Tue Oct 31 16:59          (00:21)
> dougie   pts/0        tbird2xp:0.0     Tue Oct 31 16:05 - crash  (00:54)
> runlevel (to lvl 2)   2.6.17           Tue Oct 31 15:16 - 16:59  (01:43)
> reboot   system boot  2.6.17           Tue Oct 31 15:16          (02:04)
> date     new time                      Sun Oct 29 07:11
> date     old time                      Sun Oct 29 07:12
> root     pts/3        kitchens         Sun Oct 29 07:11 - crash (2+08:04)
> dougie   pts/2        kitchens         Sat Oct 28 20:29 - crash (2+19:46)
> dougie   pts/1        kitchens         Sat Oct 28 11:37 - 16:04 (1+05:27)
> dougie   pts/0        tbird2xp:0.0     Fri Oct 27 13:16 - crash (4+03:00)
> 
> 
> And the syslog shows nothing notable around the time. Usuall just lines 
> from postfix as it processes the mail queue, then:
> 
> Oct 31 17:12:22 nick syslogd 1.4.1#17: restart (remote reception).
> Oct 31 17:12:22 nick kernel: klogd 1.4.1#17, log source = /proc/kmsg 
> started.
> Oct 31 17:12:23 nick kernel: Inspecting /boot/System.map-2.6.17
> Oct 31 17:12:23 nick kernel: Loaded 21314 symbols from 
> /boot/System.map-2.6.17.
> 
> I'm not sure how to go about tracking this down. My searching of the 
> archives shows that these symptoms could describe a faulty physical 
> component, such as memory or PSU. So my next step is probably going to 
> be trying to swap the PSU and doing a memtest. One thing about the 
> reboots is that they often appear to be in clusters. For example, around 
>  7AM to 9AM on Oct 24 it looks like it was bouncing for about two hours 
> off and on:
> 
> # last reboot
> reboot   system boot  2.6.8            Wed Oct 25 05:03          (06:50)
> reboot   system boot  2.6.8            Wed Oct 25 04:31          (07:22)
> reboot   system boot  2.6.8            Tue Oct 24 11:09         (1+00:44)
> reboot   system boot  2.6.8            Tue Oct 24 10:59          (00:06)
> reboot   system boot  2.6.8            Tue Oct 24 09:52          (01:01)
> reboot   system boot  2.6.8            Tue Oct 24 09:50          (01:03)
> reboot   system boot  2.6.8            Tue Oct 24 09:49          (01:05)
> reboot   system boot  2.6.8            Tue Oct 24 09:37          (01:17)
> reboot   system boot  2.6.8            Tue Oct 24 09:05          (01:49)
> reboot   system boot  2.6.8            Tue Oct 24 08:53          (02:00)
> reboot   system boot  2.6.8            Tue Oct 24 08:51          (02:03)
> reboot   system boot  2.6.8            Tue Oct 24 07:28          (03:26)
> reboot   system boot  2.6.8            Tue Oct 24 07:26          (03:27)
> reboot   system boot  2.6.8            Tue Oct 24 07:24          (03:29)
> reboot   system boot  2.6.8            Tue Oct 24 07:01          (03:52)
> reboot   system boot  2.6.8            Tue Oct 24 06:18          (04:36)
> 
> I'm a bit stumped on how to solve this and would appreciate any thoughts 
> on strategy.

"Tucked away in the loft", you say.  Is dust building up somewhere 
along your power supply line?  In a multiple-socket extension, 
perhaps.  A long shot, but I once had this problem.  I think the 
dust caused momentary short circuits, not long enough to blow a fuse 
but long enough to cut the power to the computer, while the dust 
burnt away - but I'm no electrician.

Cheers,
David

-- 
David Jardine

"Running Debian GNU/Linux and
loving every minute of it."  -L. von Sacher-M.(1835-1895)



Reply to: