[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#903800: 4.9.110-1 Xen PV boot workaround



Hi,

I tested this workaround : I confirm that it works on Xen host, but not on Xen guest.
If you try to start a vm with latest kernel i.e. theses parameters in cfg file :

#
#  Kernel + memory size
#
kernel      = '/boot/vmlinuz-4.9.0-7-amd64'
extra       = 'elevator=noop'
ramdisk     = '/boot/initrd.img-4.9.0-7-amd64'

The VM crash in loop with kernel error :

[    0.000000] Linux version 4.9.0-7-amd64 (debian-kernel@lists.debian.org) (gcc version 6.3.0 20170516 (Debian 6.3.0-18+deb9u1) ) #1 SMP Debian 4.9.110-1 (2018-07-05)
[    0.000000] Command line: root=/dev/xvda2 ro elevator=noop
[    0.000000] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers'
[    0.000000] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
[    0.000000] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'
[    0.000000] x86/fpu: xstate_offset[2]:  576, xstate_sizes[2]:  256
[    0.000000] x86/fpu: Enabled xstate features 0x7, context size is 832 bytes, using 'standard' format.
[    0.000000] ACPI in unprivileged domain disabled
[    0.000000] Released 0 page(s)
[    0.000000] e820: BIOS-provided physical RAM map:
[    0.000000] Xen: [mem 0x0000000000000000-0x000000000009ffff] usable
[    0.000000] Xen: [mem 0x00000000000a0000-0x00000000000fffff] reserved
[    0.000000] Xen: [mem 0x0000000000100000-0x000000007fffffff] usable
[    0.000000] NX (Execute Disable) protection: active
[    0.000000] DMI not present or invalid.
[    0.000000] Hypervisor detected: Xen
[    0.000000] e820: last_pfn = 0x80000 max_arch_pfn = 0x400000000
[    0.000000] MTRR: Disabled
[    0.000000] x86/PAT: MTRRs disabled, skipping PAT initialization too.
[    0.000000] x86/PAT: Configuration [0-7]: WB  WT  UC- UC  WC  WP  UC  UC 
[    0.000000] RAMDISK: [mem 0x02000000-0x05996fff]
[    0.000000] NUMA turned off
[    0.000000] Faking a node at [mem 0x0000000000000000-0x000000007fffffff]
[    0.000000] NODE_DATA(0) allocated [mem 0x7fc16000-0x7fc1afff]
[    0.000000] Zone ranges:
[    0.000000]   DMA      [mem 0x0000000000001000-0x0000000000ffffff]
[    0.000000]   DMA32    [mem 0x0000000001000000-0x000000007fffffff]
[    0.000000]   Normal   empty
[    0.000000]   Device   empty
[    0.000000] Movable zone start for each node
[    0.000000] Early memory node ranges
[    0.000000]   node   0: [mem 0x0000000000001000-0x000000000009ffff]
[    0.000000]   node   0: [mem 0x0000000000100000-0x000000007fffffff]
[    0.000000] Initmem setup node 0 [mem 0x0000000000001000-0x000000007fffffff]
[    0.000000] p2m virtual area at ffffc90000000000, size is 40000000
[    0.000000] Remapped 0 page(s)
[    0.000000] SFI: Simple Firmware Interface v0.81 http://simplefirmware.org
[    0.000000] smpboot: Allowing 1 CPUs, 0 hotplug CPUs
[    0.000000] PM: Registered nosave memory: [mem 0x00000000-0x00000fff]
[    0.000000] PM: Registered nosave memory: [mem 0x000a0000-0x000fffff]
[    0.000000] e820: [mem 0x80000000-0xffffffff] available for PCI devices
[    0.000000] Booting paravirtualized kernel on Xen
[    0.000000] Xen version: 4.8.4-pre (preserve-AD)
[    0.000000] clocksource: refined-jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645519600211568 ns
[    0.000000] setup_percpu: NR_CPUS:512 nr_cpumask_bits:512 nr_cpu_ids:1 nr_node_ids:1
[    0.000000] percpu: Embedded 35 pages/cpu @ffff88007f600000 s105304 r8192 d29864 u2097152
[    0.000000] PV qspinlock hash table entries: 256 (order: 0, 4096 bytes)
[    0.000000] Built 1 zonelists in Node order, mobility grouping on.  Total pages: 515978
[    0.000000] Policy zone: DMA32
[    0.000000] Kernel command line: root=/dev/xvda2 ro elevator=noop
[    0.000000] PID hash table entries: 4096 (order: 3, 32768 bytes)
[    0.000000] Memory: 1980804K/2096764K available (6250K kernel code, 1159K rwdata, 2868K rodata, 1420K init, 688K bss, 115960K reserved, 0K cma-reserved)
[    0.000000] Kernel/User page tables isolation: enabled
[    0.000000] Hierarchical RCU implementation.
[    0.000000]     Build-time adjustment of leaf fanout to 64.
[    0.000000]     RCU restricting CPUs from NR_CPUS=512 to nr_cpu_ids=1.
[    0.000000] RCU: Adjusting geometry for rcu_fanout_leaf=64, nr_cpu_ids=1
[    0.000000] Using NULL legacy PIC
[    0.000000] NR_IRQS:33024 nr_irqs:32 0
[    0.000000] xen:events: Using FIFO-based ABI
[    0.000000] Console: colour dummy device 80x25
[    0.000000] console [tty0] enabled
[    0.000000] console [hvc0] enabled
[    0.000000] clocksource: xen: mask: 0xffffffffffffffff max_cycles: 0x1cd42e4dffb, max_idle_ns: 881590591483 ns
[    0.000000] installing Xen timer for CPU 0
[    0.000000] tsc: Unable to calibrate against PIT
[    0.000000] tsc: No reference (HPET/PMTIMER) available
[    0.000000] tsc: Detected 2597.018 MHz processor
[    0.004000] Calibrating delay loop (skipped), value calculated using timer frequency.. 5194.03 BogoMIPS (lpj=10388072)
[    0.004000] pid_max: default: 32768 minimum: 301
[    0.004000] Security Framework initialized
[    0.004000] Yama: disabled by default; enable with sysctl kernel.yama.*
[    0.004000] AppArmor: AppArmor disabled by boot time parameter
[    0.004000] Dentry cache hash table entries: 262144 (order: 9, 2097152 bytes)
[    0.004000] Inode-cache hash table entries: 131072 (order: 8, 1048576 bytes)
[    0.004000] Mount-cache hash table entries: 4096 (order: 3, 32768 bytes)
[    0.004000] Mountpoint-cache hash table entries: 4096 (order: 3, 32768 bytes)
[    0.004000] ENERGY_PERF_BIAS: Set to 'normal', was 'performance'
[    0.004000] ENERGY_PERF_BIAS: View and update with x86_energy_perf_policy(8)
[    0.004000] CPU: Physical Processor ID: 0
[    0.004000] CPU: Processor Core ID: 0
[    0.004000] mce: CPU supports 2 MCE banks
[    0.004000] Last level iTLB entries: 4KB 1024, 2MB 1024, 4MB 1024
[    0.004000] Last level dTLB entries: 4KB 1024, 2MB 1024, 4MB 1024, 1GB 4
[    0.004000] Spectre V2 : Mitigation: Full generic retpoline
[    0.004000] Spectre V2 : Spectre v2 mitigation: Enabling Indirect Branch Prediction Barrier
[    0.004000] Spectre V2 : Enabling Restricted Speculation for firmware calls
[    0.004000] Speculative Store Bypass: Vulnerable
[    0.051616] Freeing SMP alternatives memory: 24K
[    0.057710] ftrace: allocating 25269 entries in 99 pages
[    0.072061] cpu 0 spinlock event irq 1
[    0.072071] smpboot: Max logical packages: 1
[    0.072078] VPMU disabled by hypervisor.
[    0.072093] Performance Events: unsupported p6 CPU model 63 no PMU driver, software events only.
[    0.072602] NMI watchdog: disabled (cpu0): hardware events not enabled
[    0.072610] NMI watchdog: Shutting down hard lockup detector on all cpus
[    0.072624] x86: Booted up 1 node, 1 CPUs
[    0.072761] devtmpfs: initialized
[    0.072813] x86/mm: Memory block size: 128MB
[    0.074028] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645041785100000 ns
[    0.074045] futex hash table entries: 256 (order: 2, 16384 bytes)
[    0.074075] pinctrl core: initialized pinctrl subsystem
[    0.074165] NET: Registered protocol family 16
[    0.074176] xen:grant_table: Grant tables using version 1 layout
[    0.074195] Grant table initialized
[    0.074377] PCI: setting up Xen PCI frontend stub
[    0.074377] ACPI: Interpreter disabled.
[    0.074377] xen:balloon: Initialising balloon driver
[    0.076045] xen_balloon: Initialising balloon driver
[    0.076053] vgaarb: loaded
[    0.076068] dmi: Firmware registration failed.
[    0.076106] PCI: System does not support PCI
[    0.076111] PCI: System does not support PCI
[    0.076237] clocksource: Switched to clocksource xen
[    0.081278] VFS: Disk quotas dquot_6.6.0
[    0.081294] VFS: Dquot-cache hash table entries: 512 (order 0, 4096 bytes)
[    0.081315] hugetlbfs: disabling because there are no supported hugepage sizes
[    0.081343] pnp: PnP ACPI: disabled
[    0.082398] NET: Registered protocol family 2
[    0.082534] TCP established hash table entries: 16384 (order: 5, 131072 bytes)
[    0.082606] TCP bind hash table entries: 16384 (order: 6, 262144 bytes)
[    0.082654] TCP: Hash tables configured (established 16384 bind 16384)
[    0.082689] UDP hash table entries: 1024 (order: 3, 32768 bytes)
[    0.082708] UDP-Lite hash table entries: 1024 (order: 3, 32768 bytes)
[    0.082750] NET: Registered protocol family 1
[    0.082788] Unpacking initramfs...
[    0.123386] Freeing initrd memory: 58972K
[    0.123786] general protection fault: 0000 [#1] SMP
[    0.123792] Modules linked in:
[    0.123799] CPU: 0 PID: 30 Comm: modprobe Not tainted 4.9.0-7-amd64 #1 Debian 4.9.110-1
[    0.123807] task: ffff880078ad7000 task.stack: ffffc90040498000
[    0.123812] RIP: e030:[<ffffffff81614d4d>]  [<ffffffff81614d4d>] ret_from_fork+0x2d/0x70
[    0.123824] RSP: e02b:ffffc9004049bf50  EFLAGS: 00010006
[    0.123829] RAX: 0000000493ef5000 RBX: ffffffff8108e9d0 RCX: ffffea0001ec61df
[    0.123835] RDX: 0000000000000002 RSI: 0000000000000002 RDI: ffffc9004049bf58
[    0.123841] RBP: 0000000000000000 R08: 0000000000000000 R09: ffff880078adc000
[    0.124009] R10: 8080808080808080 R11: fefefefefefefeff R12: ffff88007ceb7a00
[    0.124009] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[    0.124009] FS:  0000000000000000(0000) GS:ffff88007f600000(0000) knlGS:0000000000000000
[    0.124009] CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
[    0.124009] CR2: 00007ffd13e9e9b9 CR3: 0000000078af4000 CR4: 0000000000042660
[    0.124009] Stack:
[    0.124009]  0000000000000000 0000000000000000 0000000000000000 0000000000000000
[    0.124009]  0000000000000000 0000000000000000 0000000000000000 0000000000000000
[    0.124009]  0000000000000000 0000000000000000 0000000000000000 0000000000000000
[    0.124009] Call Trace:
[    0.124009] Code: c7 e8 b8 fe a8 ff 48 85 db 75 2f 48 89 e7 e8 5b ed 9e ff 50 90 0f 20 d8 65 48 0b 04 25 e0 02 01 00 78 08 65 88 04 25 e7 02 01 00 <0f> 22 d8 58 66 0f 1f 44 00 00 e9 c1 07 00 00 4c 89 e7 eb 11 e8
[    0.124009] RIP  [<ffffffff81614d4d>] ret_from_fork+0x2d/0x70
[    0.124009]  RSP <ffffc9004049bf50>
[    0.124009] ---[ end trace e2ff95a7e079b5b5 ]---

Did I miss something ?

Thanks for your help.

Best regards.

Benoît

Le lun. 16 juil. 2018 à 19:28, Hans van Kranenburg <hans@knorrie.org> a écrit :
Reportedly, adding pti=off to the kernel boot parameters will work
around the issue for now.

Turning off pti in the guest kernel is done in any case for PV. The
issue between 4.9.107 and 4.9.111 affects the detection and turning off
of pti, that's why forcing it off helps.

In 4.9.112 it's fixed in commit 1adc34adc3447c34926994b87db5d929f5ab45b5
"x86/cpu: Re-apply forced caps every time CPU caps are re-read"

Hans

Reply to: