Bug#998035: linux-image-5.10.0-9-amd64: Debian 11 in Xen PV DomU crashes intel_pmc_core on boot, DomU zombiefies.
Package: linux-image-5.10.0-9-amd64
Version: linux-image-5.10.0-9-amd64 and linux-image-5.14.0-0.bpo.2-amd64
Severity: important
Dear All,
I am reporting this bug mostly to help others with the same problem, proposing
adding a warning to the Debian 11 release notes and hoping for an upstream
kernel bugfix.
Description: Debian 11 Xen PV DomU (RAM<4GB) does not correctly shuts down
because of a intel_pmc_core module problems on Intel Xeon E3-1230 (and possibly
other Intel CPUs).
https://github.com/QubesOS/qubes-issues/issues/6052 seems to be the same issue.
Workarounds:
* Use a Debian 10 kernel in the DomU, which works
* Allocate 4+ GB RAM to the DomU
* Use PVH instead of PV (needs Xen 4.9+, and is the preferred way since Xen
4.10)
Please note:
* Backports kernel (linux-image-5.14.0-0.bpo.2-amd64) suffers from the same
problem.
* Debian 10 Dom0 Xen 4.11.4+107-gef32c7afa2-1 beheaves the same way
* Debian 9 Dom0 Xen 4.8.5.final+shim4.10.4-1+deb9u12 used PVHv1, which differs
from PVHv2 used by Xen 4.09+
Test case:
* Install Debian 10 or Debian 11, install Xen, create a PV config as below and
upon startup "BUG: unable to handle page fault for address" is displayed and it
fails to stop with "poweroff" later.
kernel = "/usr/lib/grub-xen/grub-x86_64-xen.bin"
extra = '(hd1)/boot/grub/grub.cfg'
* Change PV to PVH and it works correctly:
kernel = "/root/xen/images/debian11/vmlinuz-5.10.0-9-amd64"
ramdisk = "/root/xen/images/debian11/initrd.img-5.10.0-9-amd64"
type = 'pvh'
The full bug in my case:
[ 3.088164] BUG: unable to handle page fault for address: ffffc9004049b818
[ 3.088175] #PF: supervisor read access in kernel mode
[ 3.088179] #PF: error_code(0x0000) - not-present page
[ 3.088183] PGD 7fbd9067 P4D 7fbd9067 PUD 5186067 PMD 5303067 PTE 0
[ 3.088191] Oops: 0000 [#1] SMP NOPTI
[ 3.088195] CPU: 0 PID: 201 Comm: systemd-udevd Not tainted 5.10.0-9-amd64
#1 Debian 5.10.70-1
[ 3.088204] RIP: e030:pmc_core_probe+0x136/0x410 [intel_pmc_core]
[ 3.088209] Code: c0 48 c7 c7 48 a6 3c c0 e8 c7 25 d2 c0 48 8b 05 b0 7a 00
00 48 c7 83 88 00 00 00 20 a6 3c c0 48 63 40 50 48 03 05 92 7a 00 00 <8b> 00 48
8b 15 91 7a 00 00 48 c7 c7 e0 54 3c c0 8b 4a 54 ba 01 00
[ 3.088222] RSP: e02b:ffffc9004026fc30 EFLAGS: 00010286
[ 3.088226] RAX: ffffc9004049b818 RBX: ffff88800b028400 RCX:
00000000fe002000
[ 3.088232] RDX: ffffffffc03ca600 RSI: ffffffffc03c41f6 RDI:
ffffffffc03ca648
[ 3.088238] RBP: ffff88800b028410 R08: 0000000000000000 R09:
00000000fe001fff
[ 3.088244] R10: 0000000000007ff0 R11: ffff888008e01740 R12:
0000000000000000
[ 3.088249] R13: 0000000000000000 R14: 0000000000000006 R15:
0000000000000000
[ 3.088260] FS: 00007f94ad4928c0(0000) GS:ffff88807d400000(0000)
knlGS:0000000000000000
[ 3.088267] CS: e030 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 3.088271] CR2: ffffc9004049b818 CR3: 00000000075d4000 CR4:
0000000000050660
[ 3.088281] Call Trace:
[ 3.088288] platform_drv_probe+0x35/0x80
[ 3.088294] really_probe+0x37b/0x480
[ 3.088299] driver_probe_device+0xe1/0x150
[ 3.088303] ? driver_allows_async_probing+0x50/0x50
[ 3.088308] bus_for_each_drv+0x7e/0xc0
[ 3.088313] __device_attach+0xd8/0x1d0
[ 3.088317] bus_probe_device+0x8e/0xa0
[ 3.088321] device_add+0x399/0x840
[ 3.088325] platform_device_add+0x105/0x230
[ 3.088331] ? 0xffffffffc0327000
[ 3.088351] pmc_core_platform_init+0x78/0x1000 [intel_pmc_core_pltdrv]
[ 3.088358] do_one_initcall+0x44/0x1d0
[ 3.088363] ? do_init_module+0x23/0x260
[ 3.088381] ? kmem_cache_alloc_trace+0xf5/0x200
[ 3.088386] do_init_module+0x5c/0x260
[ 3.088391] __do_sys_finit_module+0xb1/0x110
[ 3.088397] do_syscall_64+0x33/0x80
[ 3.088402] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 3.088407] RIP: 0033:0x7f94ad94b9b9
[ 3.088411] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89
f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01
f0 ff ff 73 01 c3 48 8b 0d a7 54 0c 00 f7 d8 64 89 01 48
[ 3.088423] RSP: 002b:00007ffde7aa3158 EFLAGS: 00000246 ORIG_RAX:
0000000000000139
[ 3.088430] RAX: ffffffffffffffda RBX: 0000563dc68d1530 RCX:
00007f94ad94b9b9
[ 3.088435] RDX: 0000000000000000 RSI: 00007f94adad6e2d RDI:
0000000000000018
[ 3.088441] RBP: 0000000000020000 R08: 0000000000000000 R09:
0000563dc689c9d0
[ 3.088447] R10: 0000000000000018 R11: 0000000000000246 R12:
00007f94adad6e2d
[ 3.088453] R13: 0000000000000000 R14: 0000563dc68ce450 R15:
0000563dc68d1530
[ 3.088459] Modules linked in: intel_pmc_core_pltdrv(+) intel_pmc_core
ghash_clmulni_intel evdev aesni_intel libaes crypto_simd cryptd glue_helper
pcspkr drm fuse configfs ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2
crc32c_generic crct10dif_pclmul crct10dif_common crc32_pclmul xen_netfront
xen_blkfront crc32c_intel
[ 3.088487] CR2: ffffc9004049b818
[ 3.088491] ---[ end trace dd3aec620db68a1d ]---
[ 3.088496] RIP: e030:pmc_core_probe+0x136/0x410 [intel_pmc_core]
[ 3.088501] Code: c0 48 c7 c7 48 a6 3c c0 e8 c7 25 d2 c0 48 8b 05 b0 7a 00
00 48 c7 83 88 00 00 00 20 a6 3c c0 48 63 40 50 48 03 05 92 7a 00 00 <8b> 00 48
8b 15 91 7a 00 00 48 c7 c7 e0 54 3c c0 8b 4a 54 ba 01 00
[ 3.088514] RSP: e02b:ffffc9004026fc30 EFLAGS: 00010286
[ 3.088519] RAX: ffffc9004049b818 RBX: ffff88800b028400 RCX:
00000000fe002000
[ 3.088524] RDX: ffffffffc03ca600 RSI: ffffffffc03c41f6 RDI:
ffffffffc03ca648
[ 3.088530] RBP: ffff88800b028410 R08: 0000000000000000 R09:
00000000fe001fff
[ 3.088536] R10: 0000000000007ff0 R11: ffff888008e01740 R12:
0000000000000000
[ 3.088541] R13: 0000000000000000 R14: 0000000000000006 R15:
0000000000000000
[ 3.088552] FS: 00007f94ad4928c0(0000) GS:ffff88807d400000(0000)
knlGS:0000000000000000
[ 3.088558] CS: e030 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 3.088563] CR2: ffffc9004049b818 CR3: 00000000075d4000 CR4:
0000000000050660
Please ignore "Other system information" as I have to report this from a
different machine due to network separation.
-- System Information:
Debian Release: 10.11
APT prefers oldstable-updates
APT policy: (500, 'oldstable-updates'), (500, 'oldstable')
Architecture: amd64 (x86_64)
Foreign Architectures: i386
Kernel: Linux 4.19.0-18-amd64 (SMP w/8 CPU cores)
Kernel taint flags: TAINT_OOT_MODULE, TAINT_UNSIGNED_MODULE
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8), LANGUAGE=en_US:en (charmap=UTF-8)
Shell: /bin/sh linked to /usr/bin/dash
Init: systemd (via /run/systemd/system)
LSM: AppArmor: enabled
Reply to: