[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Debian Sarge server frozen (em64t-p4-smp)



Simon wrote:
On 4/6/06, Stephen Woodbridge <woodbri@swoodbridge.com> wrote:

Simon,

Have you checked your 3ware logs, are you getting any disk errors? Which
controller are you running? Have you run fsck on all the partitions? Not
sure any of these will help as Lennart said, it does sound like
something hardware is start to fail.

-Steve

Lennart Sorensen wrote:

On Wed, Apr 05, 2006 at 07:02:48PM +1200, Simon wrote:


Hi There, We have a mail server (postfix>amavis>dbmail) running debian
sarge (2.6.8-9-em64t-p4-smp) running dual xeons with a tyan
motherboard and 3ware hardware raid5. Everything has been running fine
up to now, but in the last week it has frozen twice. Completly stuck
needing a hard reboot to restart (ctl-alt-del dosnt work). Here is a
screenshot of the stuck screen (no ping, nothing at this point):

http://gremin.orcon.net.nz/console.jpg

Would someone be able to take a look and give me a clue here?


I wonder what autoremove_wake_function is...

journal_commit_transaction looks related to one of the journaling
filesystems, specifically ext3 I believe.

Can you do a shift+pgup to see more of the output?  How about setting up
a serial console or remote logging so you can capture the full error
messages?

Of course it really sounds like either you upgraded something recently
that is broken, or the hardware is starting to fail.


We did have a disk spit itself out of the 3ware hw raid a couple of
weeks ago, the spare kicked in and was all OK. After a re-start all
seemed fine again... but now seems like too much of a coincidence to
ignore. Replace the drive?? - im guessing yes :)

Unfort i cant scroll up as the box was really suck.. nothing but a hw
reset worked.

Well given this, I would do the following:

1) run fsck on all partitions just to be safe
2) get the 3ware controller to run a verify background task and watch the 3ware logs to see if you are getting sector repairs or errors on any of the disks. Also you might have a run diagnostics background task that might give some insight to whether the controller is sane or not. 3) setup a serial console to capture the console logging on another system so you can look at what the kernel messages are as Lennart suggested.

-Steve W.



Reply to: