[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Weird server mystery: self-reset, mostly



Never seen this before -- all daemons and all user processes killed. Zap. It happened around 23:17 Sunday, Chicago time (that's when /var/log/* abruptly stopped). Any idea what might cause this?


I was ssh'd in to my Debian server and... disconnected. No problem, I was using screen to vim some Catalyst modules, so I'll just reconnect and reattach... connection refused.

Wha?

Tried telnet to port 22, no sign of life. Tried telnet to port 80, no sign of life.

Went to the server room, logged in on the console:

will@darth:~$ uptime
 23:58:11 up 583 days,  3:03,  6 users,  load average: 0.00, 0.02, 0.08

So the server hadn't had a hard reset, still up 583 days. In /var/log/syslog there are the usual cron logs up to about 23:17 and then.. nothing.

will@darth:~$ ps afx
  PID TTY      STAT   TIME COMMAND
    2 ?        S<     0:00 [kthreadd]
    3 ?        S<     1:13  \_ [migration/0]
    4 ?        S<    29:21  \_ [ksoftirqd/0]
    5 ?        S<     0:32  \_ [watchdog/0]
    6 ?        S<     1:12  \_ [migration/1]
    7 ?        S<    77:19  \_ [ksoftirqd/1]
    8 ?        S<     0:02  \_ [watchdog/1]
    9 ?        S<    44:52  \_ [events/0]
   10 ?        S<    78:24  \_ [events/1]
   11 ?        S<     0:00  \_ [khelper]
   44 ?        S<    13:20  \_ [kblockd/0]
   45 ?        S<     0:40  \_ [kblockd/1]
   47 ?        S<     0:00  \_ [kacpid]
   48 ?        S<     0:00  \_ [kacpi_notify]
  121 ?        S<     0:00  \_ [kseriod]
  161 ?        S<    19:53  \_ [kswapd0]
  162 ?        S<     0:00  \_ [aio/0]
  163 ?        S<     0:00  \_ [aio/1]
  642 ?        S<     0:00  \_ [ksuspend_usbd]
  647 ?        S<     0:00  \_ [khubd]
  761 ?        S<     0:00  \_ [ata/0]
  764 ?        S<     0:00  \_ [ata/1]
  765 ?        S<     0:00  \_ [ata_aux]
  774 ?        S<     0:00  \_ [scsi_eh_0]
  775 ?        S<     0:00  \_ [scsi_eh_1]
  877 ?        S<    42:46  \_ [kjournald]
 1301 ?        S<    17:22  \_ [edac-poller]
 1384 ?        S<     0:00  \_ [kpsmoused]
 1640 ?        S<     0:00  \_ [kstriped]
 1654 ?        S<     0:00  \_ [ksnapd]
 1681 ?        S<    76:13  \_ [kjournald]
 1682 ?        S<   126:18  \_ [kjournald]
12642 ?        S      0:09  \_ [pdflush]
19987 ?        S      0:00  \_ [pdflush]
    1 ?        Ss    10:04 init [2]         
11064 tty2     Ss+    0:00 /sbin/getty 38400 tty2
11065 tty3     Ss+    0:00 /sbin/getty 38400 tty3
11066 tty4     Ss+    0:00 /sbin/getty 38400 tty4
11067 tty5     Ss+    0:00 /sbin/getty 38400 tty5
11068 tty6     Ss+    0:00 /sbin/getty 38400 tty6
12995 tty1     Ss     0:00 /bin/login --     
13077 tty1     S      0:00  \_ -bash
13107 tty1     R+     0:00      \_ ps afx

Freaky: init, that's process #1, isn't at the top? And all daemons except for getty were gone. All user processes including my screen sessions! and vim sessions!, were gone.

Checking 'last' didn't show any suspicious activity.

In kern.log there's only
Jan 23 23:04:59 darth kernel: [64084756.601774] exploit[25161]: segfault at 10c00b ip 00000000 sp deadc01d error 6
Jan 23 23:05:08 darth kernel: [64084765.528734] NET: Registered protocol family 5

After a quick
$ sudo bash
# cd /etc/rc2.d
# for x in S*; do sh $x start; done

the server was back up and serving... and then the saddest sight of all, of course:

will@darth:~$ screen -ls
There is a screen on:
        26279.pts-3.darth       (06/19/09 21:54:31)     (Dead ???)
Remove dead screens with 'screen -wipe'.
1 Socket in /var/run/screen/S-will.

:(

$ tail /var/log/messages
Jan 23 22:56:26 darth -- MARK --
Jan 23 23:04:59 darth kernel: [64084756.601774] exploit[25161]: segfault at 10c00b ip 00000000 sp deadc01d error 6
Jan 23 23:05:08 darth kernel: [64084765.528734] NET: Registered protocol family 5
Jan 23 23:16:26 darth -- MARK --
Jan 23 23:47:02 darth syslogd 1.5.0#5: restart.

So everything crapped out after 23:16, and I restarted it at 23:47.

Anybody got a clue as to what might have happened to kill all daemons and user-processes in one swoop? This has been a rock-solid Debian server for years...

will@darth:~$ cat /etc/debian_version 
5.0.4

--
The first step towards getting somewhere is to decide that you are not going to stay where you are.  -- J.P.Morgan


Reply to: