[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#998035: marked as done (linux-image-5.10.0-9-amd64: Debian 11 in Xen PV DomU crashes intel_pmc_core on boot, DomU zombiefies.)



Your message dated Wed, 19 Feb 2025 16:13:37 +0100 (CET)
with message-id <20250219151337.2F653BE2EE7@eldamar.lan>
and subject line Closing this bug (BTS maintenance for src:linux bugs)
has caused the Debian Bug report #998035,
regarding linux-image-5.10.0-9-amd64: Debian 11 in Xen PV DomU crashes intel_pmc_core on boot, DomU zombiefies.
to be marked as done.

This means that you claim that the problem has been dealt with.
If this is not the case it is now your responsibility to reopen the
Bug report if necessary, and/or fix the problem forthwith.

(NB: If you are a system administrator and have no idea what this
message is talking about, this may indicate a serious mail system
misconfiguration somewhere. Please contact owner@bugs.debian.org
immediately.)


-- 
998035: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=998035
Debian Bug Tracking System
Contact owner@bugs.debian.org with problems
--- Begin Message ---
Package: linux-image-5.10.0-9-amd64
Version: linux-image-5.10.0-9-amd64 and linux-image-5.14.0-0.bpo.2-amd64
Severity: important

Dear All,

I am reporting this bug mostly to help others with the same problem, proposing
adding a warning to the Debian 11 release notes and hoping for an upstream
kernel bugfix.

Description: Debian 11 Xen PV DomU (RAM<4GB) does not correctly shuts down
because of a intel_pmc_core module problems on Intel Xeon E3-1230 (and possibly
other Intel CPUs).

https://github.com/QubesOS/qubes-issues/issues/6052 seems to be the same issue.

Workarounds:
* Use a Debian 10 kernel in the DomU, which works
* Allocate 4+ GB RAM to the DomU
* Use PVH instead of PV (needs Xen 4.9+, and is the preferred way since Xen
4.10)

Please note:
* Backports kernel (linux-image-5.14.0-0.bpo.2-amd64) suffers from the same
problem.
* Debian 10 Dom0 Xen 4.11.4+107-gef32c7afa2-1 beheaves the same way
* Debian 9 Dom0 Xen 4.8.5.final+shim4.10.4-1+deb9u12 used PVHv1, which differs
from PVHv2 used by Xen 4.09+

Test case:
* Install Debian 10 or Debian 11, install Xen, create a PV config as below and
upon startup "BUG: unable to handle page fault for address" is displayed and it
fails to stop with "poweroff" later.
kernel = "/usr/lib/grub-xen/grub-x86_64-xen.bin"
extra = '(hd1)/boot/grub/grub.cfg'
* Change PV to PVH and it works correctly:
kernel = "/root/xen/images/debian11/vmlinuz-5.10.0-9-amd64"
ramdisk = "/root/xen/images/debian11/initrd.img-5.10.0-9-amd64"
type = 'pvh'

The full bug in my case:
[    3.088164] BUG: unable to handle page fault for address: ffffc9004049b818
[    3.088175] #PF: supervisor read access in kernel mode
[    3.088179] #PF: error_code(0x0000) - not-present page
[    3.088183] PGD 7fbd9067 P4D 7fbd9067 PUD 5186067 PMD 5303067 PTE 0
[    3.088191] Oops: 0000 [#1] SMP NOPTI
[    3.088195] CPU: 0 PID: 201 Comm: systemd-udevd Not tainted 5.10.0-9-amd64
#1 Debian 5.10.70-1
[    3.088204] RIP: e030:pmc_core_probe+0x136/0x410 [intel_pmc_core]
[    3.088209] Code: c0 48 c7 c7 48 a6 3c c0 e8 c7 25 d2 c0 48 8b 05 b0 7a 00
00 48 c7 83 88 00 00 00 20 a6 3c c0 48 63 40 50 48 03 05 92 7a 00 00 <8b> 00 48
8b 15 91 7a 00 00 48 c7 c7 e0 54 3c c0 8b 4a 54 ba 01 00
[    3.088222] RSP: e02b:ffffc9004026fc30 EFLAGS: 00010286
[    3.088226] RAX: ffffc9004049b818 RBX: ffff88800b028400 RCX:
00000000fe002000
[    3.088232] RDX: ffffffffc03ca600 RSI: ffffffffc03c41f6 RDI:
ffffffffc03ca648
[    3.088238] RBP: ffff88800b028410 R08: 0000000000000000 R09:
00000000fe001fff
[    3.088244] R10: 0000000000007ff0 R11: ffff888008e01740 R12:
0000000000000000
[    3.088249] R13: 0000000000000000 R14: 0000000000000006 R15:
0000000000000000
[    3.088260] FS:  00007f94ad4928c0(0000) GS:ffff88807d400000(0000)
knlGS:0000000000000000
[    3.088267] CS:  e030 DS: 0000 ES: 0000 CR0: 0000000080050033
[    3.088271] CR2: ffffc9004049b818 CR3: 00000000075d4000 CR4:
0000000000050660
[    3.088281] Call Trace:
[    3.088288]  platform_drv_probe+0x35/0x80
[    3.088294]  really_probe+0x37b/0x480
[    3.088299]  driver_probe_device+0xe1/0x150
[    3.088303]  ? driver_allows_async_probing+0x50/0x50
[    3.088308]  bus_for_each_drv+0x7e/0xc0
[    3.088313]  __device_attach+0xd8/0x1d0
[    3.088317]  bus_probe_device+0x8e/0xa0
[    3.088321]  device_add+0x399/0x840
[    3.088325]  platform_device_add+0x105/0x230
[    3.088331]  ? 0xffffffffc0327000
[    3.088351]  pmc_core_platform_init+0x78/0x1000 [intel_pmc_core_pltdrv]
[    3.088358]  do_one_initcall+0x44/0x1d0
[    3.088363]  ? do_init_module+0x23/0x260
[    3.088381]  ? kmem_cache_alloc_trace+0xf5/0x200
[    3.088386]  do_init_module+0x5c/0x260
[    3.088391]  __do_sys_finit_module+0xb1/0x110
[    3.088397]  do_syscall_64+0x33/0x80
[    3.088402]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[    3.088407] RIP: 0033:0x7f94ad94b9b9
[    3.088411] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89
f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01
f0 ff ff 73 01 c3 48 8b 0d a7 54 0c 00 f7 d8 64 89 01 48
[    3.088423] RSP: 002b:00007ffde7aa3158 EFLAGS: 00000246 ORIG_RAX:
0000000000000139
[    3.088430] RAX: ffffffffffffffda RBX: 0000563dc68d1530 RCX:
00007f94ad94b9b9
[    3.088435] RDX: 0000000000000000 RSI: 00007f94adad6e2d RDI:
0000000000000018
[    3.088441] RBP: 0000000000020000 R08: 0000000000000000 R09:
0000563dc689c9d0
[    3.088447] R10: 0000000000000018 R11: 0000000000000246 R12:
00007f94adad6e2d
[    3.088453] R13: 0000000000000000 R14: 0000563dc68ce450 R15:
0000563dc68d1530
[    3.088459] Modules linked in: intel_pmc_core_pltdrv(+) intel_pmc_core
ghash_clmulni_intel evdev aesni_intel libaes crypto_simd cryptd glue_helper
pcspkr drm fuse configfs ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2
crc32c_generic crct10dif_pclmul crct10dif_common crc32_pclmul xen_netfront
xen_blkfront crc32c_intel
[    3.088487] CR2: ffffc9004049b818
[    3.088491] ---[ end trace dd3aec620db68a1d ]---
[    3.088496] RIP: e030:pmc_core_probe+0x136/0x410 [intel_pmc_core]
[    3.088501] Code: c0 48 c7 c7 48 a6 3c c0 e8 c7 25 d2 c0 48 8b 05 b0 7a 00
00 48 c7 83 88 00 00 00 20 a6 3c c0 48 63 40 50 48 03 05 92 7a 00 00 <8b> 00 48
8b 15 91 7a 00 00 48 c7 c7 e0 54 3c c0 8b 4a 54 ba 01 00
[    3.088514] RSP: e02b:ffffc9004026fc30 EFLAGS: 00010286
[    3.088519] RAX: ffffc9004049b818 RBX: ffff88800b028400 RCX:
00000000fe002000
[    3.088524] RDX: ffffffffc03ca600 RSI: ffffffffc03c41f6 RDI:
ffffffffc03ca648
[    3.088530] RBP: ffff88800b028410 R08: 0000000000000000 R09:
00000000fe001fff
[    3.088536] R10: 0000000000007ff0 R11: ffff888008e01740 R12:
0000000000000000
[    3.088541] R13: 0000000000000000 R14: 0000000000000006 R15:
0000000000000000
[    3.088552] FS:  00007f94ad4928c0(0000) GS:ffff88807d400000(0000)
knlGS:0000000000000000
[    3.088558] CS:  e030 DS: 0000 ES: 0000 CR0: 0000000080050033
[    3.088563] CR2: ffffc9004049b818 CR3: 00000000075d4000 CR4:
0000000000050660

Please ignore "Other system information" as I have to report this from a
different machine due to network separation.



-- System Information:
Debian Release: 10.11
  APT prefers oldstable-updates
  APT policy: (500, 'oldstable-updates'), (500, 'oldstable')
Architecture: amd64 (x86_64)
Foreign Architectures: i386

Kernel: Linux 4.19.0-18-amd64 (SMP w/8 CPU cores)
Kernel taint flags: TAINT_OOT_MODULE, TAINT_UNSIGNED_MODULE
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8), LANGUAGE=en_US:en (charmap=UTF-8)
Shell: /bin/sh linked to /usr/bin/dash
Init: systemd (via /run/systemd/system)
LSM: AppArmor: enabled

--- End Message ---
--- Begin Message ---
Hi

This bug was filed for a very old kernel or the bug is old itself
without resolution.

If you can reproduce it with

- the current version in unstable/testing
- the latest kernel from backports

please reopen the bug, see https://www.debian.org/Bugs/server-control
for details.

Regards,
Salvatore

--- End Message ---

Reply to: