[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#945055: great CPU temperature increase from 5.2 to 5.5 ... and when using intel_pstate



Control: tags -1 - patch
Control: notfound -1 3.16.81-1
Control: notfound -1 4.19.87-1
Control: notfound -1 5.4-1~exp1
Control: forwarded -1 
https://bugzilla.kernel.org/show_bug.cgi?id=207245


Hey Ben.


On Fri, 2020-04-17 at 18:02 +0100, Ben Hutchings wrote:
> This is now neither "fixed" nor "found" in any 5.5 version.  Please
> update the versions properly.

Took a while till I got the mail that the bug was unarchived so I
didn't update everything immediately.


> This is also tagged "patch" but without a direct link to the
> patch(es)
> that are supposed to fix it.  (Linking to the upstream bug report is
> not specific enough.)

Sorry for the confusion I might have caused. The patch tag and also
found-in-version was based on my guess that the problems I see since
versions > 5.2 were caused by 
https://gitlab.freedesktop.org/drm/intel/issues/614

That bug was a regression introduced by a security fix that prevented
the GPU from entering RC6 sleep states.

perf showed me that I was affected by it, so I assumed the fix (which
was introduced in 5.5rc-something) would solve everything.

It didn't, as my fruther test series, which I've just sent to this
Debian as well, showed.


Even with 5.5 I see a tremendous temperature increase.



Unfortunately I'm by far not an expert enough to really tell where the
problem comes from (I'd say there may be even different problems
involved)... and I'd also need guiding what to actually test, to better
nail it down.


When I saw the problem still occurs with 5.5, I've made another test
series and reported it first at lkml:
https://lore.kernel.org/lkml/ce8097694ddfab616616f8f81521495d99c74416.camel@scientia.net/T/#u

When I got no response I've updated my older ticket at intel-drm:
https://gitlab.freedesktop.org/drm/intel/-/issues/953


My tests would indicate that there are a number of temperature
problems, in short:

- GPU intensive stuff (like playing videos)
- GPU stuff which shouldn't be intensive at all (e.g. moving around
windows)

but also:
- supposedly non-GPU intensive stuff like Alt-Tab-ing between windows,
scrolling up/down in lists in the GUI)
- stuff which doesn't even do graphics at all (see the unhide-brute and
(SHA)-verify tests I've made.



For the GPU-intensive stuff (specifically that I hit 100°C when I play
any videos) there is:
https://gitlab.freedesktop.org/drm/intel/issues/956
(intel-drm folks had asked me to put it in a separate issue)


For the general stuff (e.g. unhide brute or SHA512 verification running
much hotter), there is:
- the post to lkml
- https://bugzilla.kernel.org/show_bug.cgi?id=207245
- and since intel_pstate being enabled there's also:
  https://bugzilla.kernel.org/show_bug.cgi?id=207247


The different tickets contain also descriptions of symptoms I've see,
e.g. where temperatures go through the roof even when just moving
windows, Alt-Tab-switching between them, scrolling up/down in a window,
and so on.


See especially the plots in the git repo I've provided, which shows how
much higher the temperature is from 5.2 to 5.5 (and for each of them
for intel_pstate  being on or off).



Any help on what to test would be highly appreciated.


I did some preliminary tests with perf record, while then e.g.
scrolling up/down in a GUI window (used the mail list in Evolution)
while the temperatures go up to ~80°C ...
This would have indicated that during that, the number of events as
recorded by perf record, grows by a magnitude.

I haven't had time yet to make more systematic tests.


Thanks,
Chris.


Reply to: