[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Trying to get logs on the 'swap to death ' issue



Hello all

I'm at work right now, so I don't have much info with me; anyway, I will say
what is happening and post the logs latter (i.e. at GMT 11:30PM or so).

I have been having those problems with pflocal; I wrote the commands that
Marcus posted to get some debug info, I even made some of them into shell
scripts to gain some time; dunno why but yesterday I managed to get 5
reboots due to swapping in about 1 hour... and I think I can reproduced them
more or less, here is what happens:

* I login into GNU
* I do a 'ping www.gnu.org' to see if the network is working (because I'm
having the same problem again: the passive trans is properly set - I can see
that with showtrans - but it doesn't work, I must set it again (no need for
active, just passive again with the same args) and then I get the net to
work.
* It doesn't work
* do a 'settrans -fg /servers/socket/2 etc.'
* ping again, it works
*do some ls, startx, apt-get.
* it starts to swap, sometimes just after the sucessfull ping, other times
much later, after doing some work and even whem I'm not ouching it for
minutes...

Anyway, it might be something specific to my box, wich is strange since it's
pretty standard - apt and dselect have been working fine without any --force
required.

When I detect the sawpping I do the 'ps -F hurd-long -a -x > mem.txt' (if
there is anything wrong here it is a typo)... it takes sometime but I was
able to do it 5 times; the threads in pflocal are around 6000/7000 when I do
it (prolly more after a while); I then do the 'portinfo PID | wc -l';
generally when the system is normal it outputs 70/72, the one time I did get
it to say something the value was much higher (like 5000/7000 or so, IIRC).

That one time I was going to do a 'gdb /hurd/pflocal' and set the
'noninvasive on' and 'attach PID' to do a 'info threads' on it; I wasn't
able because by the time I can call gdb the system is already pretty much
blocked (you will se my my logs latter that pflocal - in the begining of the
crisis - takes 60% of CPU and enormous amounts of RAM).

I recall reading something about using the crash server to help with this...
I don't have that with me right now, but I would like further instructions
to be able to obtain the maximum info on this, to help the ppl actually
coding.

As stated, I will post the 5 logs of ps latter one.

Best Regards,

Frederico S. Muñoz
fsmunoz@sdf.lonestar.org






Reply to: