[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Server freezing under heavy cpu / disk / network load



Hello, i would like to ask for help since i am feeling a bit clueless here.

I recently bought a low end dedicated machine that was supposed to host
some services: squid, proftpd and rtorrent.

I installed debian lenny and immediately updated to squeeze and
configured the services. I started rtorrent but after the machine
reaches heavy load ( > 10MBps network traffic, maxed out cpu), it holds
for a while then all network connections drop and i have to order a hard
reset in order to bring it back online.

I thought it was a misconfiguration issue, so i tried reconfiguring the
server and installing ubuntu 10.04 on it, but i'm getting the same results.

I had a look at /var/log/kernel.log and on ubuntu i am seeing some
"Clocksource tsc unstable" messages right before the machine crashes.

I can see the same kind of messages on squeeze aswell, just not that
close to the reboots as they were on ubuntu. Google tells me they might
have something to do with cpu frequency scaling. There's loads of
reports by users like me who are experiencing random freezes. Seems
though that there is no clear answer: people solved the issue with video
card driver updates, replacing bad hardware, changing the frequency
scaling governor, and so on.

So far i only played around with the frequency scaling governor, setting
it to "performance" seems to freeze the machine quicker than with the
default "ondemand".

Here are the cpu specs of the machine:
# cat /proc/cpuinfo
processor       : 0
vendor_id       : AuthenticAMD
cpu family      : 15
model           : 39
model name      : AMD Athlon(tm) 64 Processor 3700+
stepping        : 1
cpu MHz         : 2200.000
cache size      : 1024 KB
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 1
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
mca cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx mmxext fxsr_opt
lm 3dnowext 3dnow up pni lahf_lm
bogomips        : 4398.97
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management: ts fid vid ttp tm stc

# cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_available_governors
conservative userspace powersave ondemand performance

# cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_available_frequencies
1000000 1800000 2000000 2200000

# cat /sys/devices/system/clocksource/clocksource0/available_clocksource
acpi_pm

I asked the datacenter support to perform a hardware check and they
tested the machine for 8 hours without errors.

Now. How can i find out what's going on with this server? I'm pretty
sure its faulty hardware, but i have no proof to show to the datacenter
support.

I am currently running squeeze 2.6.32-5-686-bigmem. The machine has
1024MB of ram and 2x160Gb Sata HDDs. The NIC is a 100MBit realtek one,
with proper drivers from the firmware-realtek debian package.

I would love to have some opinions on how to deal with this.
-- 
Davide Mirtillo


Reply to: