[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: oom-killer vs system not reachable



On Sun, Dec 19, 2004 at 10:28:28AM +0100, Robert Ian Smit wrote:
> The points concerning the mbox file are taken (in fact I knew as
> much). Do you think that Linux might stop to respond or get stuck
> for a while as a result of the big file. 

One problem would be when mutt rewrites the file on exit (or pressing
$), that is a lot of IO to do.  On linux 2.4 at least if one process is
using a lot of IO, the other processes seem to grind to a halt.  I think
they are fixing this in 2.6.  Perhaps you could try upgrading to 2.6 if
you're using 2.4?

> In other words would the symptoms of an oom-killer event and the
> resource shortage before it be noticable for a longer period?

As I understand, there's two ways the computer can run out of vm -
first, if the system is pretty idle, and one process quickly allocates a
large amount of memory which goes over the limit, that process will be
killed, and probably no other programs will die.  This is relatively
painless.

Another situation is that several processes are competing for memory,
cpu and io, and you don't have enough ram to hold all the active pages
in ram at once.  Then the system "thrashes", constantly swapping.
If your active set expands gradually and more, the system gets very very
slow.  If it is on its way to running out of memory, it could take a
long time before the memory-hog processes are killed, and other
"innocent" processes might quite likely be slain into the bargain.

The weirdest thing I saw with swapping was once, trying to install red
hat on a relatively small machine, and its graphical installer used up
all the ram, and swap wasn't set up yet, so the only thing it could do
to free more ram was to "swap out" the executables back to the cd-rom,
so the thing was actually thrashing on the CD-rom, it was most unusable!

Anyway, I think the best way to find out what is happening on your box
is to monitor or log what happens when the system dies like this, either
watch it with top, or monitor maybe with something like:

mkdir -p /tmp/death.log
	while true; do
	D=`date +%Y%m%d.%H%M%S`
	( w; echo; free; echo; ps auxf; ) > /tmp/death.log/$D
	sleep 15
done



Reply to: