[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

server crash, what to check?



Hello,
AMD64(dual xeon) etch box with 2.6.18 kernel, 8 Gig memory but no swap space.
(forgot to turn it on)
Filesystems are ext3 and mdadm Raid10.
I have noticed that 2 of my 6 drives have some SMART errors but mdstat
looks fine so far.
Box hung completely and had to hard reboot it.
Here is the output from /var/log/messages. I'm not sure what whappened.

Nov 13 06:48:17 pgss -- MARK --
Nov 13 07:08:17 pgss -- MARK --
Nov 13 07:28:18 pgss -- MARK --
Nov 13 07:48:18 pgss -- MARK --
Nov 13 08:08:18 pgss -- MARK --
Nov 13 08:28:18 pgss -- MARK --
Nov 13 08:48:18 pgss -- MARK --
Nov 13 08:49:04 pgss kernel: PGD 18c7d2067 PUD 166fb8067 PMD 0
Nov 13 08:49:04 pgss kernel: CPU 0
Nov 13 08:49:04 pgss kernel: Modules linked in: quota_v2 xt_state 
iptable_mangle ipt_MASQUERADE xt_tcpudp iptable_nat iptable_filter ip_tables 
x_tables butto
n ac battery tun dm_snapshot dm_mirror dm_mod ip_nat_ftp ip_nat 
ip_conntrack_ftp ip_conntrack_tftp ip_conntrack nfnetlink loop shpchp 
psmouse i2c_i801 pci_ho
tplug serio_raw i2c_core pcspkr evdev ext3 jbd mbcache raid456 xor raid10 
raid1 md_mod ide_generic sd_mod usbhid piix ahci generic 3c59x mii ide_core 
libata
scsi_mod ehci_hcd tg3 uhci_hcd e1000 thermal processor fan
Nov 13 08:49:04 pgss kernel: Pid: 9892, comm: smbd Not tainted 2.6.18-6-
amd64 #1
Nov 13 08:49:04 pgss kernel: RIP: 0010:[<ffffffff80287af9>]  
[<ffffffff80287af9>] free_uid+0x37/0x7e
Nov 13 08:49:04 pgss kernel: RSP: 0018:ffff81018b791d28  EFLAGS: 00010002
Nov 13 08:49:04 pgss kernel: RAX: 0000000000200200 RBX: ffff8101ddca02c0 
RCX: ffff8101ddca02f8
Nov 13 08:49:04 pgss kernel: RDX: 0000000000100100 RSI: 0000000000000086 
RDI: ffffffff80451cf0
Nov 13 08:49:04 pgss kernel: RBP: 0000000000000086 R08: ffff81018b790000 
R09: 0000000000000027
Nov 13 08:49:04 pgss kernel: R10: ffff810257aab8c0 R11: 0000000000000246 
R12: ffff81018b791e78
Nov 13 08:49:04 pgss kernel: R13: 0000000000000000 R14: 0000000000000009 
R15: 000000000000000a
Nov 13 08:49:04 pgss kernel: FS:  00002af7d5779e80(0000) GS:ffffffff80521000
(0000) knlGS:0000000000000000
Nov 13 08:49:04 pgss kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 
000000008005003b
Nov 13 08:49:04 pgss kernel: CR2: 0000000000100108 CR3: 000000010ea64000 
CR4: 00000000000006e0
Nov 13 08:49:04 pgss kernel: Process smbd (pid: 9892, threadinfo 
ffff81018b790000, task ffff81003386d770)
Nov 13 08:49:04 pgss kernel: Stack:  ffff810186e7ea90 ffff810186e7e590 
ffff810256ce5c28 ffffffff80287f0c
Nov 13 08:49:04 pgss kernel:  ffff810186e7e590 ffffffff80288497 
0000000000000027 0000000000000000
Nov 13 08:49:04 pgss kernel:  ffff81018b791e78 ffff81003386d770 
ffff81003386dd18 ffff81018b791ef8
Nov 13 08:49:04 pgss kernel: Call Trace:
Nov 13 08:49:04 pgss kernel:  [<ffffffff80287f0c>] __sigqueue_free+0x24/0x36
Nov 13 08:49:04 pgss kernel:  [<ffffffff80288497>] 
__dequeue_signal+0x130/0x19b
Nov 13 08:49:04 pgss kernel:  [<ffffffff802894df>] dequeue_signal+0x3c/0xbc
Nov 13 08:49:04 pgss kernel:  [<ffffffff80229271>] 
get_signal_to_deliver+0x165/0x49d
Nov 13 08:49:04 pgss kernel:  [<ffffffff80227fe1>] do_signal+0x55/0x751
Nov 13 08:49:04 pgss kernel:  [<ffffffff8027b37b>] __wake_up_common+0x3e/0x68
Nov 13 08:49:04 pgss kernel:  [<ffffffff8025c35e>] thread_return+0x0/0xe7
Nov 13 08:49:04 pgss kernel:  [<ffffffff80257c9f>] sysret_signal+0x1c/0x27
Nov 13 08:49:04 pgss kernel:  [<ffffffff80257f23>] 
ptregscall_common+0x67/0xac
Nov 13 08:49:04 pgss kernel:
Nov 13 08:49:04 pgss kernel:
Nov 13 08:49:04 pgss kernel: Code: 48 89 42 08 48 89 10 48 c7 41 08 00 02 20 
00 48 c7 43 38 00
Nov 13 08:49:04 pgss kernel:  RSP <ffff81018b791d28>

At this point, I hard rebooted the server.
If I should post some other log info, I'd be happy to.
Any advice would be great! Thanks!
Mike


Reply to: