Roman Gelfand wrote:
> For couple of months, now, I have this postfix smtp gateway on debian
> wheezy during which I had no problems with connectivity. Now, after
> couple of minutes I get disconnected from putty ssh session. The
> issue is not only there. Apache web server self updating cgi site
> dies after a while.
>
> How can I troubleshoot this?
Start by becoming familiar with the /var/log/* log files. Look
through them and see if you find anything that gives clues to the
problem. Start with these files:
/var/log/syslog
/var/log/kern.log
What you are describing sounds like something not specific to any one
program but across all of them. Therefore I suspect one of several
possibilities.
* bad memory dimm, causing memory errors, causing process death
* bad motherboard, causing general errors, causing process death
* kernel bug, hitting processes
* not enough memory, causing Linux out-of-memory killer to be
activated and the oom is killing your active processes
* cable problems in your system, disk drive cable causing I/O
transfer corruption between storage and system
* possibly a failing disk drive
* an endless list of other possibilities
Those are just ideas. To check systems I will try to look for
specific problems. Run 'memtest86' or 'memtest86+' to look for ram
problems. Being a hardware guy I will disassemble the machine and
re-assemble it. Because connectors tend to be unreliable. Carefully
unplugging and plugging back in connectors will scrub them a little
bit and can improve a fix connection.
I will look to see how much memory is available. I like the 'htop'
program for this. It gives a nice bar graph that spacially shows the
amount of memory used and where. If there is still a significant
amount of memory used for file system buffer cache then life is good.
If not then file system buffer cache suffers. But as for a possible
problem for you if there isn't enough virtual memory then the Linux
kernel will invoke the out-of-memory killer which will start killing
off active processes. Ensure that you have enough to avoid the OOM
killer. (Or disable it entirely. I have ranted about turning off the
OOM killer before.)
I would check the disk drive with smartctl. Is it logging errors?
Run SMART tests and check the results. SMART isn't a good predictor
of failure but sometimes it does confirm failure.
Hopefully those ideas help. Good luck!
Bob
Attachment:
signature.asc
Description: Digital signature