[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Debian crashes: sw, hw or malicious hacker/virus problems?



on Mon, Oct 23, 2000 at 08:57:59AM +0200, Jean Orloff (anonquestions@free.fr) wrote:
> Hello, dear debian fellows!
> 
> Please forgive my paranoid anonymity, in view of the last section of
> this message.
> 
> 1) My problem:
> 
> I have happily used debian since 1995 (0.93R6 if I recall?). But since I
> installed 2.1 on my new PC at work, about a year ago, that machine
> undergoes about a crash per month in average. Nothing to scare a
> windblows user, of course, but unbearable for someone who knows this
> should not be so. Especially as these crashes are unrecoverable: screen
> frozen, mouse/keyboard frozen (no vt switching nor clean reboot
> possible) and even no access from outside through the network. Thus no
> alternative to the brutal power switchoff, with subsequent fsck'ing of
> the whole disk.
> 
> When does this happen? Always with a heavy load (2-3 users on a 128Mb
> pentium 400, each with several windows, netscape, emacs etc + some
> compilation or latex2html going on); always with at least one remote
> ssh login. I also sometimes had the impression of the mouse freezing
> temporarily before the total crash, but you know how short time
> causality can be violated in the human brain.

I've had interaction effects with several kernels and modules, largely
in the 2.2.10 - 2.2.15 series.  You might want to check this.  2.2.17
and 2.2.4pre8 have done well by me so far.

There are also reports of mobos which are simply flakey.  See recent
posts to the misc-OpenBSD mailing list for an example, search web for
others.  http://www.openbsd.org/.

> 2) Software problems?
> 
> In the beginning, I attributed this to the network interface card
> (3C905C=Tornado) that was not officially supported by Donald Becker's
> 3C59x driver. 

<snip>

See above.

> 3) Hardware problems?
> 
> Bored with switching kernels, I followed the hardware problem track.
> Despite a successful memory test at boot, 

Means *absolutely nothing*.

> maybe one of the 2 memory bars was bad? I ran during the summer on
> half memory, but it ended up by a crash again. I switched the memory
> bar: problem again a month later.  Maybe the NIC slot was bad? I
> switched with the soundcard last week. No crash yet, but I have
> reasons to believe it won't help.

Try a memory testing program, and/or boot your system with various
memory configurations at the LILO prompt -- you may be able to isolate
the point at which memory does (or doesn't) effect system stability.

> 4) Hacker/virus problems?
> 
> During the very first hour of the very first install, I got port-scan
> attacked (see log below). Bad point for debian, I thought: what is the
> probability of a PC being attacked in the first hour of its connection
> to the net? Looks more probable that the attack was triggered by the
> install process!

Unlikely.

> Anyway, the ports for telnet (22) and ftp (??) were filtered by the
> local router (except for local machines), I was not running any
> daemon, so I was not too scared. After watching the logs for about a
> week, I opened the machine to full internet exclusively through ssh
> connections which are not filtered by the local router.

This data isn't too helpful w/o an externally run scan on your system.
'nmap -sS' or 'namp -O' should produce useful results.  Also netstat
output (though I don't know specific command line switches which you
should use).

> Until last week, I had no reason to think of hacker origin to my
> crashes. But last week, I got 2 crashes. And I noticed something very
> curious in the accounting logs. Among the last processes that finished
> less than 5 minutes before the crash, there was a bunch of NAMELESS root
> processes, that started at 0 unix time (Jan 1 1970) and lasted 0 second
> (!?). E.g: 
> 
> # lastcomm
> 
> S20acct           S     root     ??         0.01 secs Thu Oct 19 19:40
> accton            S     root     ??         0.00 secs Thu Oct 19 19:40
> ---> reboot
>                         root     ??         0.00 secs Thu Jan  1 01:00
>                         root     ??         0.00 secs Thu Jan  1 01:00
>                         root     ??         0.00 secs Thu Jan  1 01:00

<snip>

> Suspicious, no? 

No idea.  Not familiar with lastcomm and its output.  I'd try booting a
known good system and checking its initial output.  It's possible that
some kernel psuedoprocesses might appear w/o names.  It's also possible
(probable?) that this is an indication of bad data in internal process
tables, which could be a likely precursor of a crash.

> But even more curious: my previous machine (call it PC2), with the
> same install, but a totally different (older) hardware. Users from my
> machine (call it PC1) often ssh-log to PC2 and vice-versa.
> Furthermore, during the portscan-attacked install, the new PC1 was
> bearing the name and address the previously existing PC2. Anyway, for
> the first time last week, PC2 endured 2 crashes too. On one of these
> crashes, there were nameless-timeless root processes just before the
> crash also. But no sign of remote login the full day: looks more like
> a virus than a hacker? 

If you've found a GNU/Linux virus, you're the second person to have
done so.  Extremely unlikely.

> 5) Questions:


> 2) What tools could I use to help pinpointing the problem? E.g: a
> process accounting that would log the beginning (instead of
> ending) processes...

> 3) Can a network driver really freeze the full kernel?

Almost certainly yes.

> 4) How can the kernel be frozen? Is there a kernel bug that propagated
> through 2.2.13-17?

Shit happens?

> Many thanks for any help!
> 
> PS: you can privately reply to this mail.
> 
> Annex 1: Portscan attack (november 99)
> 
> 9:07:13 tcplogd: port 1114 connection attempt from
> unknown@some.foreign.host [123.4.576.89]
> 9:07:13 tcplogd: port 1116 (idem)
> 9:07:15 tcplogd: port 1171 "
> 9:07:18 tcplogd: port 1174 "

...not in itself unusual or noxious, unless followed by an attack.  What
ports do you have open on this box?  To the Internet?

I'm leaning heavily to a hardware explanation.

Best option at this point would be a wipe and rebuild of the system.
Save out your selected packages (dpkg --get-selections), archive
userspace, wipe everything, and rebuild.  If you can get a *known good*
set of MD5 checksums, you might also want to run debsums *from a known
good boot disk*.  If you still have problems, start isolating hardware
components.

Any kernel log messages of note?  Oopses and panics can sometimes
indicate something.

Also attaching a script which is useful for generating kernel bug
reports.  Read it before running.  Note that you should fill in
additional information (marked 'n/a' by default) manually before sending
a kernel bug report.  I'd written a watcher script to generate reports
at the time of oopses on one sick system I was admining.

-- 
Karsten M. Self <kmself@ix.netcom.com>     http://www.netcom.com/~kmself
 Evangelist, Opensales, Inc.                    http://www.opensales.org
  What part of "Gestalt" don't you understand?      There is no K5 cabal
   http://gestalt-system.sourceforge.net/        http://www.kuro5hin.org
GPG fingerprint: F932 8B25 5FDD 2528 D595 DC61 3847 889F 55F2 B9B0
#!/bin/bash

# Kernel bug report generator script
# Script generated from prior bug report form by Karsten M. Self
# $Revision: 1.3 $ $Date: 2000/05/13 07:48:36 $ $Author: root $


# ------------------------------------------------------------------------
# [Some of this is taken from Frohwalt Egerer's original linux-kernel FAQ]
# 
#      What follows is a suggested procedure for reporting Linux bugs. You
# aren't obliged to use the bug reporting format, it is provided as a guide
# to the kind of information that can be useful to developers - no more.
# 
#      If the failure includes an "OOPS:" type message in your log or on
# screen please read "Documentation/oops-tracing.txt" before posting your
# bug report. This explains what you should do with the "Oops" information
# to make it useful to the recipient.
# 
#       Send the output the maintainer of the kernel area that seems to
# be involved with the problem. Don't worry too much about getting the
# wrong person. If you are unsure send it to the person responsible for the
# code relevant to what you were doing. If it occurs repeatably try and
# describe how to recreate it. That is worth even more than the oops itself.
# The list of maintainers is in the MAINTAINERS file in this directory.
# 
#       If you are totally stumped as to whom to send the report, send it to
# linux-kernel@vger.rutgers.edu. (For more information on the linux-kernel
# mailing list see http://www.tux.org/lkml/).
# 
# This is a suggested format for a bug report sent to the Linux kernel mailing 
# list. Having a standardized bug report form makes it easier  for you not to 
# overlook things, and easier for the developers to find the pieces of 
# information they're really interested in. Don't feel you have to follow it.
# 
#    First run the ver_linux script included as scripts/ver_linux or
# at <URL:ftp://ftp.sai.msu.su/pub/Linux/ver_linux> It checks out
# the version of some important subsystems.  Run it with the command
# "sh scripts/ver_linux"
# 
# Use that information to fill in all fields of the bug report form, and
# post it to the mailing list with a subject of "PROBLEM: <one line
# summary from [1.]>" for easy identification by the developers    
# ------------------------------------------------------------------------

# indent by one tabstop
function tabout () { sed -e '/^/s//	/'; }

kversion=$( uname -r )
dmesg=dmesg
dmesg="cat /var/log/kern.log"	# for debugging only
oops_number=$( $dmesg | grep Oops | tail -1 | sed -e '/^.*:/s///' )
oops_module=$( $dmesg | grep EIP | tail -1 | sed -e '/^.*:/s///' )

cat <<EOF

This is a script-generated kernel bug report.  

The system administrator/developer should provide additional information 
where appropriate.

kernel-bug-report: $Revision: 1.3 $ $Date: 2000/05/13 07:48:36 $ $Author: root $

[1.] One line summary of the problem:    

	PROBLEM:  $1 oops $oops_number in $oops_module, $kversion kernel

[2.] Full description of the problem/report:

	n/a

[3.] Keywords (i.e., modules, networking, kernel):

	linux kernel $kversion oops $oops_number $oops_module

[4.] Kernel version (from /proc/version):

$( cat /proc/version | tabout )

[5.] Output of Oops.. message (if applicable) with symbolic information 
     resolved (see Documentation/oops-tracing.txt)

$( $dmesg | ksymoops -k /proc/ksyms | tabout )

[6.] A small shell script or example program which triggers the
     problem (if possible)

	n/a

[7.] Environment

$( set | tabout )

[7.1.] Software (add the output of the ver_linux script here)

$( sh -f /usr/src/linux/scripts/ver_linux | tabout )

[7.2.] Processor information (from /proc/cpuinfo):

$( cat /proc/cpuinfo | tabout )

[7.3.] Module information (from /proc/modules):

$( cat /proc/modules | tabout )

[7.4.] SCSI information (from /proc/scsi/scsi)

$( cat /proc/scsi/scsi | tabout )

[7.5.] Other information that might be relevant to the problem
       (please look in /proc and include all information that you
       think to be relevant):

	System memory (at time of oops):
$( cat /proc/meminfo | tabout )

	System uptime:
$( uptime | tabout )

[X.] Other notes, patches, fixes, workarounds:
EOF

Attachment: pgplHbjGdN4th.pgp
Description: PGP signature


Reply to: