Bug#970503: linux-image-5.8.0-1-amd64: using swap makes the machine hang
There is the same issue with the 4.19.132-1 Linux kernel from stable.
I've done more tests, and from a VT shortly after the boot,
"memhog 15320M" was fine, but "memhog 15350M" gave errors in
the console after some time, same as in the dmesg output below.
[ 406.347520] ata1.00: exception Emask 0x0 SAct 0x4000000 SErr 0x40000 action 0x6 frozen
[ 406.364822] ata1: SError: { CommWake }
[ 406.374357] ata1.00: failed command: READ FPDMA QUEUED
[ 406.384633] ata1.00: cmd 60/08:d0:00:09:f5/00:00:1d:00:00/40 tag 26 ncq dma 4096 in
res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 406.421074] ata1.00: status: { DRDY }
[ 406.433787] ata1: hard resetting link
[ 406.747063] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[ 406.787976] ata1.00: configured for UDMA/133
[ 406.798082] ata1.00: device reported invalid CHS sector 0
[ 406.798089] sd 0:0:0:0: [sda] tag#26 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE cmd_age=30s
[ 406.798091] sd 0:0:0:0: [sda] tag#26 Sense Key : Illegal Request [current]
[ 406.798094] sd 0:0:0:0: [sda] tag#26 Add. Sense: Unaligned write command
[ 406.798097] sd 0:0:0:0: [sda] tag#26 CDB: Read(10) 28 00 1d f5 09 00 00 00 08 00
[ 406.798099] blk_update_request: I/O error, dev sda, sector 502597888 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
[ 406.826011] ata1: EH complete
[ 484.340739] INFO: task kworker/dying:5 blocked for more than 120 seconds.
[ 484.354535] Tainted: P OE 5.8.0-1-amd64 #1 Debian 5.8.7-1
[ 484.368881] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 484.399712] kworker/dying D 0 5 2 0x80004000
[...]
So it appears that this could be a hardware issue, but I did a short
and a long self-test with smartctl and I did not get any error. And
badblocks gave no errors either.
Comments from https://github.com/openzfs/zfs/issues/10094 suggest that
it may not necessarily be (uniquely) a hardware issue. Note: I do not
use zfs, I've found this with a search for the error messages[*] on the
web, and this one is similar.
[*] In particular the "Sense: Unaligned write command"
https://github.com/openzfs/zfs/issues/8552 suggests a drive firmware
bug (which could explain the absence of errors with smartctl and
badblocks), but in this case the kernel might have a way to avoid it.
Also I'm wondering why this issue occurs *only* with swap. This could
mean that the problem is on the kernel side (at least partly).
--
Vincent Lefèvre <vincent@vinc17.net> - Web: <https://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)
Reply to: