thermal events (or lack thereof)
Hi,
I've been working steadily over the past few weeks to get my new HP nx6125
working under Debian (amd64 port) and have made significant progress.
However, there is one considerable problem: thermal events don't seem to be
recognised or processed by the kernel (until I do a
cat /proc/acpi/thermal_zone/TZ?/temperature). As soon as I do anything CPU
intensive I really run the risk of frying my laptop :-(
To be more specific, I am running kernel 2.6.14.3
(www.kernel.org vanilla) with the double timer patch applied (see
http://bugzilla.kernel.org/attachment.cgi?id=6061&action=view). When I boot
up, my thermal trip points get set nicely. The first is at 58 *C, then 65*C,
then 75 *C, and 80*C (S5 = 95 *C). When I was testing things I was doing
frequent executions of
cat /proc/acpi/thermal_zone/TZ?/temperature
and I would observe the temp rise to 58 C, then the fan would kick in, the
first trip point would then (automatically) re-set to 50 C and the CPU would
cool through 8 C before the fan turned off (nice, I thought, and very clever
this re-setting of trip points--sorry I'm very new to ACPI). When the fan
turned off, the trip point would again re-set to 58 C. So, I thought all was
working well. However, subsequent tests done by running glxgears and not
executing the above cat command allowed the CPU temp to rise above several
trip points without the fans kicking in! Only when I ran the above cat
command did the fans start!?
So, I stopped acpid and did a
cat /proc/acpi/event
while running glxgears. I waited a while and then did a
cat /proc/acpi/thermal_zone/TZ?/temperature to see that indeed the temp of
TZ1 had exceeded 58C---and immediately /proc/acpi/event received a thermal
event (note: the temp had already exceeded 58 C, my first thermal trip point;
the thermal event only occurred when I did the 'cat'). So, in order for
thermal events to "get through/processed" I need to keep doing
cat /proc/acpi/thermal_zone/TZ?/temperature!!!!
Can anybody shed some light on this behaviour. I don't know much about ACPI,
but it seems (?) like the linux kernel is not processing the thermal events
properly. Incidentally, I am also seeing spurious syslog errors that read
APIC error on CPU0: 40(40) meaning that some interrupts presumably are not
being correctly identified by the interrupt controller (thermal ones? could
there be some correlation here?).
Other info: the HP nx6125 is a Turion 64 based laptop with ATI chipset (yes, I
know). I am running acpid and have just installed powernowd (doesn't fix it).
I have also observed the above behaviour running the standard Debian
2.6.12-1-amd64 kernel (booting with no_timer_check to avoid double timer
interrupts). Any help, suggestions would be greatly appreciated. If anyone
has any ideas why catting /proc/acpi/thermal_zone/TZ?/temperature gets
things to work, I'd be very happy to hear an explanation, too.
Richard
Reply to: