Bug#884116: linux-image-4.9.0-4-amd64: screen atrifacts then crash
Hi,
I have a similar issue, for me it occurred when upgrading from
kernel package 4.9.0-3 (stable) to 4.9.0-4. With the later I sometimes
got display content corruptions, always very quickly got X
freeze/lock-up. Once or twice I could change to a console and observed
the "*ERROR* CPU pipe A FIFO underrun" message in the kernel log. Trying
to stop/start the X session always got to a complete lock-up.
Some other bugs are similar: #859639, #884001. The kernel version
where people experienced the issue can change though.
I tried using the Intel driver instead of modesetting, as it helped
for some people, but it didn't help in my case.
I tried a newer kernel, 4.13.0-0.bpo.1 from backports: it helped a
bit. Where 4.9.0-4 froze very quickly, sometimes right at the SDDM login
other times after less than a minute, 4.13 could last several hours and
some suspend/resume cycles. But it also had the same issue eventually.
I enabled the IOMMU (intel_iommu=on), and it caught something.
There seem to be an access error before the freeze:
[14312.568400] DMAR: DRHD: handling fault status reg 3
[14312.568406] DMAR: [DMA Write] Request device [00:02.0] fault addr
1197000 [fault reason 23] Unknown
[14319.871599] [drm] GPU HANG: ecode 8:0:0x85dffffb, in Xorg [1265],
reason: Hang on rcs0, action: reset
[14319.871639] drm/i915: Resetting chip after gpu hang
[14327.894140] drm/i915: Resetting chip after gpu hang
[14337.878141] drm/i915: Resetting chip after gpu hang
[14349.878136] drm/i915: Resetting chip after gpu hang
[14448.886146] drm/i915: Resetting chip after gpu hang
So where I observed the "CPU pipe A FIFO underrun", now I don't see
it anymore but it's replaced by this IOMMU error (DMAR), followed by the
GPU HANG. When this happens, the X display freezes but I can get back to
a console reliably, even if it typically takes several seconds. If I try
to restart the X session however, I quickly get back into the same
problems. When I can get to a console I see many IOMMU exceptions:
[14457.416489] DMAR: DRHD: handling fault status reg 3
[14457.416496] DMAR: [DMA Write] Request device [00:02.0] fault addr
1e20000 [fault reason 23] Unknown
[14457.416584] DMAR: DRHD: handling fault status reg 2
[14457.416591] DMAR: [DMA Write] Request device [00:02.0] fault addr
1e20000 [fault reason 23] Unknown
[14461.966545] dmar_fault: 544 callbacks suppressed
[14461.966549] DMAR: DRHD: handling fault status reg 3
[14461.966562] DMAR: [DMA Write] Request device [00:02.0] fault addr
1e25000 [fault reason 23] Unknown
[14461.966751] DMAR: DRHD: handling fault status reg 2
Even with the IOMMU enabled the system ends up freezing solid if I
persist, requiring a power cycle. The only reliable way to recover from
such an error is a power cycle anyway.
All this on an up to date Debian Stretch 9.3, on a Thinkpad X1 with
5th gen CPU (i5-5200U). I use SDDM with KDE5/Plasma.
For now I'm back on kernel 4.9.0-3, which is the last usable for
me. I doesn't mean the underlying issue is not there (bug report #859639
has the issue starting with 4.9.0-1), maybe some little changes makes
the issue probability changes widely depending on system and configurations?
I'm not competent to investigate this further on my own, but if
anyone as suggestions on tests to make to investigate this issue, let me
know.
Thanks
Reply to: