[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#884116: linux-image-4.9.0-4-amd64: screen atrifacts then crash



    Hi,

    I have a similar issue, for me it occurred when upgrading from kernel package 4.9.0-3 (stable) to 4.9.0-4. With the later I sometimes got display content corruptions, always very quickly got X freeze/lock-up. Once or twice I could change to a console and observed the "*ERROR* CPU pipe A FIFO underrun" message in the kernel log. Trying to stop/start the X session always got to a complete lock-up.

    Some other bugs are similar: #859639, #884001. The kernel version where people experienced the issue can change though.

    I tried using the Intel driver instead of modesetting, as it helped for some people, but it didn't help in my case.

    I tried a newer kernel, 4.13.0-0.bpo.1 from backports: it helped a bit. Where 4.9.0-4 froze very quickly, sometimes right at the SDDM login other times after less than a minute, 4.13 could last several hours and some suspend/resume cycles. But it also had the same issue eventually.

    I enabled the IOMMU (intel_iommu=on), and it caught something. There seem to be an access error before the freeze:

[14312.568400] DMAR: DRHD: handling fault status reg 3
[14312.568406] DMAR: [DMA Write] Request device [00:02.0] fault addr 1197000 [fault reason 23] Unknown [14319.871599] [drm] GPU HANG: ecode 8:0:0x85dffffb, in Xorg [1265], reason: Hang on rcs0, action: reset
[14319.871639] drm/i915: Resetting chip after gpu hang
[14327.894140] drm/i915: Resetting chip after gpu hang
[14337.878141] drm/i915: Resetting chip after gpu hang
[14349.878136] drm/i915: Resetting chip after gpu hang
[14448.886146] drm/i915: Resetting chip after gpu hang

    So where I observed the "CPU pipe A FIFO underrun", now I don't see it anymore but it's replaced by this IOMMU error (DMAR), followed by the GPU HANG. When this happens, the X display freezes but I can get back to a console reliably, even if it typically takes several seconds. If I try to restart the X session however, I quickly get back into the same problems. When I can get to a console I see many IOMMU exceptions:

[14457.416489] DMAR: DRHD: handling fault status reg 3
[14457.416496] DMAR: [DMA Write] Request device [00:02.0] fault addr 1e20000 [fault reason 23] Unknown
[14457.416584] DMAR: DRHD: handling fault status reg 2
[14457.416591] DMAR: [DMA Write] Request device [00:02.0] fault addr 1e20000 [fault reason 23] Unknown
[14461.966545] dmar_fault: 544 callbacks suppressed
[14461.966549] DMAR: DRHD: handling fault status reg 3
[14461.966562] DMAR: [DMA Write] Request device [00:02.0] fault addr 1e25000 [fault reason 23] Unknown
[14461.966751] DMAR: DRHD: handling fault status reg 2

    Even with the IOMMU enabled the system ends up freezing solid if I persist, requiring a power cycle. The only reliable way to recover from such an error is a power cycle anyway.

    All this on an up to date Debian Stretch 9.3, on a Thinkpad X1 with 5th gen CPU (i5-5200U). I use SDDM with KDE5/Plasma.

    For now I'm back on kernel 4.9.0-3, which is the last usable for me. I doesn't mean the underlying issue is not there (bug report #859639 has the issue starting with 4.9.0-1), maybe some little changes makes the issue probability changes widely depending on system and configurations?     I'm not competent to investigate this further on my own, but if anyone as suggestions on tests to make to investigate this issue, let me know.

Thanks


Reply to: