[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

System woes..unkillable processes and more!



I've had intermittant problems with one of my
Debian systems. It's running potato and
the 2.2.10 kernel-image in potato.

In the middle of June, I went on vacation and
left the box up so I could dial-in and check
faxes/emails. About a day after I came back,
it crashed (yes..Linux crashed!) with
kmod running in circles "Can't locate
module binfmt-0000". Any attempts to 
run any programs gave me "Can't execute
binary file".

I rebooted, and was forced to run
e2fsck on all of my filesystems manually.
It took quite a while and fixed lots
of problems, and also performed
a bad-sector scan. I rebooted again, and
found that my lost+found directories were
full of files, and I began to discover
that I was missing lots of files. I lost
a 10MB mail folder, the /var/lib/dpkg/*
databases, etc. Cursing myself for not
having backed up recently, I reinstalled
and mounted /home over NFS from my other
system (thank god..its had no problems).
Also, I formatted the drive under windows
and discovered that it had 1 MB of bad
sectors out of 6 GB. 

I've been running along happily until now.
My system had been up about 7 days, until
I discovered an unkillable process
(gzip -9f /var/log/kern.log.0). Reading
up on deja news, I tried killing it
with various signals, attaching to it
with gdb and strace. None worked. I couldn't
log in anymore, either (hung the kernel
in a blocking IO call?). So, I
rebooted with Ctrl-Alt-Del. Init,
said it was rebooting, but didn't do anything.
I manually rebooted, and upon the 
"your filesystems have errors" fsck check,
I got the binfmt-0000 message again.

My filesystems have lots of errors now,
and I don't want this reinstallation to
be a regular procedure. What should I do?
Has anyone had problems like these before?
-- 
Stephen Pitts
smpitts@midsouth.rr.com
webmaster - http://www.mschess.org


Reply to: