Bug#1044518: linux: "RIP: 0010:get_xsave_addr+0x9b/0xb0" stacktrace in early boot with -24 bullseye kernel

To: submit@bugs.debian.org
Subject: Bug#1044518: linux: "RIP: 0010:get_xsave_addr+0x9b/0xb0" stacktrace in early boot with -24 bullseye kernel
From: "Adam D. Barratt" <adam@adam-barratt.org.uk>
Date: Sun, 13 Aug 2023 18:21:22 +0100
Message-id: <[🔎] 2d930e9f3448493234b0d0f343a1b19a848badcc.camel@adam-barratt.org.uk>
Reply-to: "Adam D. Barratt" <adam@adam-barratt.org.uk>, 1044518@bugs.debian.org

Source: linux
Version: 5.10.179-5
User: debian-admin@lists.debian.org
Usertags: needed-by-DSA-Team
X-Debbugs-Cc: debian-admin@lists.debian.org, adsb@debian.org

Hi,

Since the kernels on both the host and guests were upgraded to
5.10.179-5 (from 5.10.179-3), the guests on one of our Ganeti clusters
have been reporting as tainted. Looking at dmesg shows the following
trace early in boot:

[    0.201347] RIP: 0010:get_xsave_addr+0x9b/0xb0
[    0.201351] Code: 48 83 c4 08 5b e9 15 80 bc 00 80 3d 8d 7c 80 01 00 75 a8 48 c7 c7 97 de 6b b2 89 74 24 04 c6 05 79 7c 80 01 01 e8 f5 96 88 00 <0f> 0b 8b 74 24 04 eb 89 31 c0 e9 e6 7f bc 00 66 0f 1f 44 00 00 89
[    0.201353] RSP: 0000:ffffffffb2c03ec8 EFLAGS: 00010282
[    0.201356] RAX: 0000000000000000 RBX: ffffffffb2e6a600 RCX: ffffffffb2cb3768
[    0.201358] RDX: c0000000ffffefff RSI: 00000000ffffefff RDI: 0000000000000247
[    0.201359] RBP: ffffffffb2e6a4a0 R08: 0000000000000000 R09: ffffffffb2c03ce8
[    0.201361] R10: ffffffffb2c03ce0 R11: ffffffffb2ccb7a8 R12: 0000000000000246
[    0.201362] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[    0.201365] FS:  0000000000000000(0000) GS:ffff9588fbc00000(0000) knlGS:0000000000000000
[    0.201367] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    0.201368] CR2: ffff9588fffff000 CR3: 000000008260a001 CR4: 00000000007308b0
[    0.201373] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[    0.201374] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[    0.201376] Call Trace:
[    0.201383]  identify_cpu+0x51f/0x540
[    0.201389]  identify_boot_cpu+0xc/0x94
[    0.201392]  arch_cpu_finalize_init+0x5/0x47
[    0.201395]  start_kernel+0x4ec/0x599
[    0.201401]  secondary_startup_64_no_verify+0xb0/0xbb
[    0.201406] ---[ end trace d7d9074a88473cb2 ]---

The systems seem to be running OK, but the stacktrace presumably points
to an issue somewhere.

A sample kvm invocation for an affected guest is

ganeti04   18354 30.1  0.5 6015620 1114084 ?     Sl   Aug11 832:22 /usr/bin/kvm -name geo1.debian.org -m 1024 -smp 2 -pidfile /var/run/ganeti/kvm-hypervisor/pid/geo1.debian.org -device virtio-balloon -daemonize -D /var/log/ganeti/kvm/geo1.debian.org.log -machine pc-i440fx-5.2 -monitor unix:/var/run/ganeti/kvm-hypervisor/ctrl/geo1.debian.org.monitor,server,nowait -serial unix:/var/run/ganeti/kvm-hypervisor/ctrl/geo1.debian.org.serial,server,nowait -usb -display none -cpu host -uuid 36cf5fbc-1414-4b27-874e-ea3153150aa9 -device virtio-rng-pci,bus=pci.0,addr=0x1e,max-bytes=1024,period=1000 -global isa-fdc.fdtypeA=none -netdev type=tap,id=nic-6e9afdf8-ccaf-42e8,fd=10 -device virtio-net-pci,id=nic-6e9afdf8-ccaf-42e8,bus=pci.0,addr=0xd,netdev=nic-6e9afdf8-ccaf-42e8,mac=aa:00:00:46:8f:08 -incoming tcp:172.29.182.13:8102 -qmp unix:/var/run/ganeti/kvm-hypervisor/ctrl/geo1.debian.org.qmp,server,nowait -qmp unix:/var/run/ganeti/kvm-hypervisor/ctrl/geo1.debian.org.kvmd,server,nowait -boot c -device virtio-blk-pci,id=disk-8a45befd-be45-4b75,bus=pci.0,addr=0xc,drive=disk-8a45befd-be45-4b75 -drive file=/var/run/ganeti/instance-disks/geo1.debian.org:0,format=raw,if=none,aio=threads,cache=none,discard=unmap,id=disk-8a45befd-be45-4b75,auto-read-only=off -runas ganeti04

It seems that buster guests on the same host are unaffected, with
similar-looking command lines.

The host's CPUs are Intel Xeon Silver 4110. Our other x86-64 clusters
either use AMD CPUs (also with "-cpu host") or Xeon E5-2699 v3 CPUs,
with "-cpu Haswell-noTSX".

Regards,

Adam

Reply to:

Follow-Ups:
- Bug#1044518: linux: "RIP: 0010:get_xsave_addr+0x9b/0xb0" stacktrace in early boot with -24 bullseye kernel
  - From: "Adam D. Barratt" <adam@adam-barratt.org.uk>
- Processed: Re: Bug#1044518: linux: "RIP: 0010:get_xsave_addr+0x9b/0xb0" stacktrace in early boot with -24 bullseye kernel
  - From: "Debian Bug Tracking System" <owner@bugs.debian.org>
- Processed: Re: Bug#1044518: linux: "RIP: 0010:get_xsave_addr+0x9b/0xb0" stacktrace in early boot with -24 bullseye kernel
  - From: "Debian Bug Tracking System" <owner@bugs.debian.org>
- Bug#1044518: marked as done (linux: "RIP: 0010:get_xsave_addr+0x9b/0xb0" stacktrace in early boot with -24 bullseye kernel)
  - From: "Debian Bug Tracking System" <owner@bugs.debian.org>
- Bug#1044518: marked as done (linux: "RIP: 0010:get_xsave_addr+0x9b/0xb0" stacktrace in early boot with -24 bullseye kernel)
  - From: "Debian Bug Tracking System" <owner@bugs.debian.org>

Prev by Date: Bug#1043564: linux: Please Re-enable DC states for drm/i915
Next by Date: Re: Review of the initial packaging of the carl9170 firmware
Previous by thread: Processed: bug 1042517 is forwarded to https://gitlab.freedesktop.org/drm/intel/-/issues/8991, tagging 1042517
Next by thread: Bug#1044518: linux: "RIP: 0010:get_xsave_addr+0x9b/0xb0" stacktrace in early boot with -24 bullseye kernel
Index(es):
- Date
- Thread