[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: How to cool my cpu temperature?



เมื่อ ศ. 2007-01-05 เวลา 21:56 -0800, Andrew Sackville-West เขียนว่า:
On Fri, Jan 05, 2007 at 10:26:42PM -0600, Cybe R. Wizard wrote:
> Marc Shapiro <mshapiro_42@yahoo.com>  said:
> > I can claim firsthand experience with exactly that.  Box overheated.  
> > Fried capacitors.  Required new motherboard and CPU.  Fortunately I
> > was able to get a similar MB, only slight upgrade, so I was able to
> > use my old memory and  didn't have to replace that, too.  If the
> > problem is the power supply (mine was) it is much cheaper to replace
> > the PS now than the MB in a few weeks.  Other fans and heatsinks are
> > still less expensive than a new CPU (and, possibly, MB).
> 
> I had much the same experience but my loss was total.  That brings up
> the question of how to tell if the PS is going out.  My motherboard had
> fan, temp and voltage sensors that I /finally/ got working but the PS
> didn't seem to be represented in those.  Are there physical hints
> pre-death (enough pre-death for box salvage) for a power supply?

I've had a couple desktop psu's fail. symptoms have been intermittent
hard-locks, out-of-spec voltages, difficulty when rebooting (such as
power leds come on, but no POST) etc. Nothing definitive, but the
problems have gone away with a new power supply. My understanding is
that psu's are the most failure prone item in a computer. I always
consider them first if I'm having any sort of difficult-to-diagnose
problem.

.02

A
I search on the internet. I found the mail which decribe the syntom of the same family of my laptop
"
More data for the problem:
Compaq V4000 (laptop) - Centrino 1.73G. Max CPU Temp (100 C) (from Intel)
Using Ubuntu Dapper. Under high load would claim critical temp reached and halt. dmesg shows CPU reached (102 C).
Using powersave and kpowersave

For me this problem started out of the blue - not following any kernel change or particular apt-get updates.

I use "watch 1 acpitool -tfc" to keep watch over the system atm. Noteworthy mentions are:

  Fan : <not available>
  Throttling control : no
  Limit interface : no
  critical (S5): 100 C
  passive: 95 C: tc1=2 tc2=5 tsp=300 devices=0xdffea660

First 3 seems wrong for a Centrino -- but I guess that is a problem with the ACPI interface to BIOS here.

The fan (there is one) responds autonomously -- probably BIOS controlled? So does the above really matter.

Doing something like kernel compile I would see the CPU temp hovering between 80-100. Passive would kick in every now and again.

polling_interval was set to 2, I changed this to 30 and observed that sometimes the CPU temp spiked at 102, 105, 107 but for no more than 1 second then immediately dropped back to sub-100. No instability, so could be a glitch?

Sometimes Linux will hit 100+ on 30 seconds and halt.

My conclusions:

the polling is far too rigid. Perhaps it should take some averages over another interval or require a sustained critical temperature before ditching the system. (make this user configurable under /proc/acpi/ as is the rest). I like the idea of polling_interval being 2 but my system would be fine if it only acted on the critical temperature if the CPU was 100+ for more than 3 of these intervals.

The passive trip could be wrong, but that depends on the interpretation of the 100+ spikes.

I currently avoid the problem by changing things to:

  echo 5 > /proc/acpi/thermal_zone/THR0/polling_frequency
  echo 5 > /proc/acpi/thermal_zone/THR1/polling_frequency
  echo "110:102:90:60:50:40" > /proc/acpi/thermal_zone/THR0/trip_points

The 110 attempts to offset the spike (which is a rare spike); the 90 sets the passive kick-in which takes the CPU speed to 1.3G during the passive region.

Powersave and co (tried a few) seemed to be doing their job. (note Klaptop is the only thing that can successfully suspend to RAM for me)

I'm of the belief that my hardware (1+ year old always working) is showing some minor cracks with the 100+ temp spike. But I also think the kernel could be more forgiving of it.
"
I doubt that Is this a solution? It just span a theshold.
Kan


Reply to: