Re: Diagnosing occassional random reboots
On Tue, Oct 31, 2006 at 05:29:29PM +0000, Dougie Nisbet wrote:
> A server which has been running steadily for years is beginning to
> reboot. To the best of my knowledge, nothing has changed. It is a
> dual-processor PIII. It runs stable.
>
> It is tucked away in the loft and usually has no monitor attached so
> tracking this down is difficult. However even if I brought it into a
> more convenient area, short of sitting staring at the screen waiting for
> a crash or reboot, I'm not sure it would help much.
>
> I've tried rebuilding a newer kernel from backports.org. And trimmed it
> right down as much as possible. There is nothing useful in syslog. A
> typical series of reboots looks like:
>
> dougie pts/0 tbird2xp:0.0 Tue Oct 31 17:15 still logged in
> runlevel (to lvl 2) 2.6.17 Tue Oct 31 17:12 - 17:21 (00:08)
> reboot system boot 2.6.17 Tue Oct 31 17:12 (00:08)
> dougie pts/0 tbird2xp:0.0 Tue Oct 31 17:09 - crash (00:02)
> runlevel (to lvl 2) 2.6.17 Tue Oct 31 16:59 - 17:12 (00:12)
> reboot system boot 2.6.17 Tue Oct 31 16:59 (00:21)
> dougie pts/0 tbird2xp:0.0 Tue Oct 31 16:05 - crash (00:54)
> runlevel (to lvl 2) 2.6.17 Tue Oct 31 15:16 - 16:59 (01:43)
> reboot system boot 2.6.17 Tue Oct 31 15:16 (02:04)
> date new time Sun Oct 29 07:11
> date old time Sun Oct 29 07:12
> root pts/3 kitchens Sun Oct 29 07:11 - crash (2+08:04)
> dougie pts/2 kitchens Sat Oct 28 20:29 - crash (2+19:46)
> dougie pts/1 kitchens Sat Oct 28 11:37 - 16:04 (1+05:27)
> dougie pts/0 tbird2xp:0.0 Fri Oct 27 13:16 - crash (4+03:00)
>
>
> And the syslog shows nothing notable around the time. Usuall just lines
> from postfix as it processes the mail queue, then:
>
> Oct 31 17:12:22 nick syslogd 1.4.1#17: restart (remote reception).
> Oct 31 17:12:22 nick kernel: klogd 1.4.1#17, log source = /proc/kmsg
> started.
> Oct 31 17:12:23 nick kernel: Inspecting /boot/System.map-2.6.17
> Oct 31 17:12:23 nick kernel: Loaded 21314 symbols from
> /boot/System.map-2.6.17.
>
> I'm not sure how to go about tracking this down. My searching of the
> archives shows that these symptoms could describe a faulty physical
> component, such as memory or PSU. So my next step is probably going to
> be trying to swap the PSU and doing a memtest. One thing about the
> reboots is that they often appear to be in clusters. For example, around
> 7AM to 9AM on Oct 24 it looks like it was bouncing for about two hours
> off and on:
>
> # last reboot
> reboot system boot 2.6.8 Wed Oct 25 05:03 (06:50)
> reboot system boot 2.6.8 Wed Oct 25 04:31 (07:22)
> reboot system boot 2.6.8 Tue Oct 24 11:09 (1+00:44)
> reboot system boot 2.6.8 Tue Oct 24 10:59 (00:06)
> reboot system boot 2.6.8 Tue Oct 24 09:52 (01:01)
> reboot system boot 2.6.8 Tue Oct 24 09:50 (01:03)
> reboot system boot 2.6.8 Tue Oct 24 09:49 (01:05)
> reboot system boot 2.6.8 Tue Oct 24 09:37 (01:17)
> reboot system boot 2.6.8 Tue Oct 24 09:05 (01:49)
> reboot system boot 2.6.8 Tue Oct 24 08:53 (02:00)
> reboot system boot 2.6.8 Tue Oct 24 08:51 (02:03)
> reboot system boot 2.6.8 Tue Oct 24 07:28 (03:26)
> reboot system boot 2.6.8 Tue Oct 24 07:26 (03:27)
> reboot system boot 2.6.8 Tue Oct 24 07:24 (03:29)
> reboot system boot 2.6.8 Tue Oct 24 07:01 (03:52)
> reboot system boot 2.6.8 Tue Oct 24 06:18 (04:36)
>
> I'm a bit stumped on how to solve this and would appreciate any thoughts
> on strategy.
"Tucked away in the loft", you say. Is dust building up somewhere
along your power supply line? In a multiple-socket extension,
perhaps. A long shot, but I once had this problem. I think the
dust caused momentary short circuits, not long enough to blow a fuse
but long enough to cut the power to the computer, while the dust
burnt away - but I'm no electrician.
Cheers,
David
--
David Jardine
"Running Debian GNU/Linux and
loving every minute of it." -L. von Sacher-M.(1835-1895)
Reply to: