[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#970503: linux-image-5.8.0-1-amd64: using swap makes the machine hang



There is the same issue with the 4.19.132-1 Linux kernel from stable.

I've done more tests, and from a VT shortly after the boot,
"memhog 15320M" was fine, but "memhog 15350M" gave errors in
the console after some time, same as in the dmesg output below.

[  406.347520] ata1.00: exception Emask 0x0 SAct 0x4000000 SErr 0x40000 action 0x6 frozen
[  406.364822] ata1: SError: { CommWake }
[  406.374357] ata1.00: failed command: READ FPDMA QUEUED
[  406.384633] ata1.00: cmd 60/08:d0:00:09:f5/00:00:1d:00:00/40 tag 26 ncq dma 4096 in
                        res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[  406.421074] ata1.00: status: { DRDY }
[  406.433787] ata1: hard resetting link
[  406.747063] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[  406.787976] ata1.00: configured for UDMA/133
[  406.798082] ata1.00: device reported invalid CHS sector 0
[  406.798089] sd 0:0:0:0: [sda] tag#26 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE cmd_age=30s
[  406.798091] sd 0:0:0:0: [sda] tag#26 Sense Key : Illegal Request [current]
[  406.798094] sd 0:0:0:0: [sda] tag#26 Add. Sense: Unaligned write command
[  406.798097] sd 0:0:0:0: [sda] tag#26 CDB: Read(10) 28 00 1d f5 09 00 00 00 08 00
[  406.798099] blk_update_request: I/O error, dev sda, sector 502597888 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
[  406.826011] ata1: EH complete
[  484.340739] INFO: task kworker/dying:5 blocked for more than 120 seconds.
[  484.354535]       Tainted: P           OE     5.8.0-1-amd64 #1 Debian 5.8.7-1
[  484.368881] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  484.399712] kworker/dying   D    0     5      2 0x80004000
[...]

So it appears that this could be a hardware issue, but I did a short
and a long self-test with smartctl and I did not get any error. And
badblocks gave no errors either.

Comments from https://github.com/openzfs/zfs/issues/10094 suggest that
it may not necessarily be (uniquely) a hardware issue. Note: I do not
use zfs, I've found this with a search for the error messages[*] on the
web, and this one is similar.

[*] In particular the "Sense: Unaligned write command"

https://github.com/openzfs/zfs/issues/8552 suggests a drive firmware
bug (which could explain the absence of errors with smartctl and
badblocks), but in this case the kernel might have a way to avoid it.

Also I'm wondering why this issue occurs *only* with swap. This could
mean that the problem is on the kernel side (at least partly).

-- 
Vincent Lefèvre <vincent@vinc17.net> - Web: <https://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)


Reply to: