[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

favourite crashes - and workarounds



Hi,

there are two particular crashes that are a nuisance to me:

1. Apt To Crash
The symptoms are always the same: apt-get install/source will output
its first line "Reading Package Lists" and then everything stops.
apt-cache can also cause this (no output in this case). I strongly
suspect that accessing the package information cache is the problem,
but I'm at a loss how this can kill vital parts of the system. Note
that this does not usually happen the first time apt is used, but more
like the 10th to 15th time. Issuing the exact same command after a
reboot will normally work perfectly.

2. The Mad Translator
Quite often in the course of a hurd session, one of the translators
will run wild. "ps AF hurd" will show quickly increasing memory
consumption and a rising "Th" (I guess that's for threads) number. The
machine starts to thrash, becoming unresponsive, and if you're not
fast enough with identifying and killing the problematic process,
you'll run out of swap ... Sometimes the hurd will kill the problem
child itself and the situation normalizes again, but usually
collateral damage will enforce a reboot anyway. pflocal and term seem
to be the special culprits, but this could be because they are the
most heavily used.

For #2 I have a workaround that seems to work quite well. It's the
following perl script "reaper":

#!/usr/bin/perl
while (1) {
    open P, "ps AF hurd w|";
    while (<P>) {
	($pid, $uid, $th, $mem, $rss, $utime, $stime, @name) = split;
	if ($th > 100 && $name[0] !~ /ext2fs/) {
	    printf "%4d %3d %5s %s\n",$pid,$th,$rss,join " ",@name;
	    kill 'TERM', $pid;
	    sleep 1;
	    kill 'KILL', $pid;
	    }
    }
 sleep 1;
}

It kills processes with a high th count. Line 6 may need adaption - it
currently targets processes with th>100 that are not named *ext2fs*
(the ext2fs seems very stable and about 120 seems to be a normal
working condition).

Since I use the script, I have not experienced any more crashes of
the mad translator type. The script itself segfaults at times ... I
just restart the thing.

I'd appreciate any further workarounds, fixes, debugging tips, or your
own favourite crash scenarious.

-- 
Robbe

Attachment: signature.ng
Description: PGP signature


Reply to: