[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: VM crashes with linux-image-4.5.0-0.bpo.2-amd64:amd64/jessie-backports on KVM-Host



Hi Rolf,

"On 30 May 2016 at 03:32, Rolf Kutz <rk@vzsze.de> wrote:
>
> After installing linux-image-4.5.0-0.bpo.2-amd64 on my KVM Host, I
> experience
> problems with a virtual machine. The virtual machine starts up ok moѕt of
> the
> time, but crashes after rebooting the VM. Switching the VM off and on again
> makes
> it start up ok, but again crash on reboot. I don't see any errors in the
> KVM-host logs. I've seen the bug with Kernel linux-image-4.5.0-0.bpo.2-amd64
> and
> linux-image-4.4.0-0.bpo.1-amd64 on the VM.
>
> Going back to linux-image-4.4.0-0.bpo.1-amd64 on the KVM-Server makes the
> problem
> disappear. Other VMs on that server run fine with both kernels. The VM in
> question is an NFS4-Server running Debian
> Jessie. The KVM-Images are on a BTRFS-Partition.

Just to clarify: 1. What kernel version is the VM host system?  2.
Have you installed mcelog and edac-utils on the host system?  3. Are
there any kernel, mce, or edac errors on the *host* system when you
encounter this crash?

> 10:08:47 rk@freckle:~$ virsh console storage
> Connected to domain storage
> Escape character is ^]
>
> Debian GNU/Linux 8 storage ttyS0
>
> storage login: rk
> Passwort: Letzte Anmeldung: Samstag, den 28. Mai 2016, 09:57:46 CEST von
> rkutz-nb.lan.zsze.de auf pts/0
> Linux storage 4.5.0-0.bpo.2-amd64 #1 SMP Debian 4.5.3-2~bpo8+1 (2016-05-13)
> x86_64
>
...
> 10:09:02 rk@storage:~$ dmesg [    0.000000] Linux version
> 4.5.0-0.bpo.2-amd64 (debian-kernel@lists.debian.org) (gcc version 4.9.2
> (Debian 4.9.2-10) ) #1 SMP Debian 4.5.3-2~bpo8+1 (2016-05-13)
> [    0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-4.5.0-0.bpo.2-amd64
> root=UUID=4c43ff54-ff7c-470f-bf9c-391e4357a3b5 ro text console=tty0
> console=ttyS0,115200n8
...
> [    0.000000] DMI: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
> 1.7.5-20140531_083030-gandalf 04/01/2014
> [    0.000000] Hypervisor detected: KVM
...
> [    0.000000] ACPI: RSDP 0x00000000000F0C30 000014 (v00 BOCHS )
> [    0.000000] ACPI: RSDT 0x00000000BFFE1927 000034 (v01 BOCHS  BXPCRSDT
> 00000001 BXPC 00000001)
> [    0.000000] ACPI: FACP 0x00000000BFFE0B37 000074 (v01 BOCHS  BXPCFACP
> 00000001 BXPC 00000001)
> [    0.000000] ACPI: DSDT 0x00000000BFFE0040 000AF7 (v01 BOCHS  BXPCDSDT
> 00000001 BXPC 00000001)
> [    0.000000] ACPI: FACS 0x00000000BFFE0000 000040
> [    0.000000] ACPI: SSDT 0x00000000BFFE0BAB 000CBC (v01 BOCHS  BXPCSSDT
> 00000001 BXPC 00000001)
> [    0.000000] ACPI: APIC 0x00000000BFFE1867 000088 (v01 BOCHS  BXPCAPIC
> 00000001 BXPC 00000001)
> [    0.000000] ACPI: HPET 0x00000000BFFE18EF 000038 (v01 BOCHS  BXPCHPET
> 00000001 BXPC 00000001)
> [    0.000000] ACPI: Local APIC address 0xfee00000
> [    0.000000] No NUMA configuration found
> [    0.000000] Faking a node at [mem 0x0000000000000000-0x000000013fffffff]
> [    0.000000] NODE_DATA(0) allocated [mem 0x13fff8000-0x13fffcfff]
> [    0.000000] kvm-clock: Using msrs 4b564d01 and 4b564d00
> [    0.000000] kvm-clock: cpu 0, msr 1:3fff0001, primary cpu clock
> [    0.000000] kvm-clock: using sched offset of 7729218211 cycles
> [    0.000000] clocksource: kvm-clock: mask: 0xffffffffffffffff max_cycles:
> 0x1cd42e4dffb, max_idle_ns: 881590591483 ns
> [    0.000000] Zone ranges:
> [    0.000000]   DMA      [mem 0x0000000000001000-0x0000000000ffffff]
> [    0.000000]   DMA32    [mem 0x0000000001000000-0x00000000ffffffff]
> [    0.000000]   Normal   [mem 0x0000000100000000-0x000000013fffffff]
> [    0.000000]   Device   empty
> [    0.000000] Movable zone start for each node
> [    0.000000] Early memory node ranges
> [    0.000000]   node   0: [mem 0x0000000000001000-0x000000000009efff]
> [    0.000000]   node   0: [mem 0x0000000000100000-0x00000000bffdffff]
> [    0.000000]   node   0: [mem 0x0000000100000000-0x000000013fffffff]
> [    0.000000] Initmem setup node 0 [mem
> 0x0000000000001000-0x000000013fffffff]
> [    0.000000] On node 0 totalpages: 1048446
> [    0.000000]   DMA zone: 64 pages used for memmap
> [    0.000000]   DMA zone: 21 pages reserved
> [    0.000000]   DMA zone: 3998 pages, LIFO batch:0
> [    0.000000]   DMA32 zone: 12224 pages used for memmap
> [    0.000000]   DMA32 zone: 782304 pages, LIFO batch:31
> [    0.000000]   Normal zone: 4096 pages used for memmap
> [    0.000000]   Normal zone: 262144 pages, LIFO batch:31
...
> [    0.000000] Booting paravirtualized kernel on KVM
...
> [    0.444591] Disabling memory control group subsystem

Ok, this might be why cgroups are failing later on.

...
> [    0.711237] PCI host bridge to bus 0000:00
> [    0.712905] pci_bus 0000:00: root bus resource [io  0x0000-0x0cf7 window]
> [    0.715365] pci_bus 0000:00: root bus resource [io  0x0d00-0xadff window]
> [    0.717879] pci_bus 0000:00: root bus resource [io  0xae0f-0xaeff window]
> [    0.749677] pci_bus 0000:00: root bus resource [io  0xaf20-0xafdf window]
> [    0.752139] pci_bus 0000:00: root bus resource [io  0xafe4-0xffff window]
> [    0.756149] pci_bus 0000:00: root bus resource [mem 0x000a0000-0x000bffff
> window]
> [    0.759078] pci_bus 0000:00: root bus resource [mem 0xc0000000-0xfebfffff
> window]
> [    0.762016] pci_bus 0000:00: root bus resource [bus 00-ff]
> [    0.764145] pci 0000:00:00.0: [8086:1237] type 00 class 0x060000
> [    0.764896] pci 0000:00:01.0: [8086:7000] type 00 class 0x060100
> [    0.765861] pci 0000:00:01.1: [8086:7010] type 00 class 0x010180
> [    0.774695] pci 0000:00:01.1: reg 0x20: [io  0xc120-0xc12f]
> [    0.778250] pci 0000:00:01.1: legacy IDE quirk: reg 0x10: [io
> 0x01f0-0x01f7]
> [    0.780906] pci 0000:00:01.1: legacy IDE quirk: reg 0x14: [io  0x03f6]
> [    0.783367] pci 0000:00:01.1: legacy IDE quirk: reg 0x18: [io
> 0x0170-0x0177]
> [    0.787765] pci 0000:00:01.1: legacy IDE quirk: reg 0x1c: [io  0x0376]
> [    0.790540] pci 0000:00:01.2: [8086:7020] type 00 class 0x0c0300
> [    0.799439] pci 0000:00:01.2: reg 0x20: [io  0xc080-0xc09f]
> [    0.803387] pci 0000:00:01.3: [8086:7113] type 00 class 0x068000
> [    0.804072] pci 0000:00:01.3: quirk: [io  0x0600-0x063f] claimed by PIIX4
> ACPI
> [    0.807087] pci 0000:00:01.3: quirk: [io  0x0700-0x070f] claimed by PIIX4
> SMB
> [    0.810123] pci 0000:00:02.0: [1b36:0100] type 00 class 0x030000
> [    0.817679] pci 0000:00:02.0: reg 0x10: [mem 0xf4000000-0xf7ffffff]
> [    0.827102] pci 0000:00:02.0: reg 0x14: [mem 0xf8000000-0xfbffffff]
> [    0.834400] pci 0000:00:02.0: reg 0x18: [mem 0xfc050000-0xfc051fff]
> [    0.841704] pci 0000:00:02.0: reg 0x1c: [io  0xc0a0-0xc0bf]
> [    0.864118] pci 0000:00:02.0: reg 0x30: [mem 0xfc040000-0xfc04ffff pref]
> [    0.864601] pci 0000:00:03.0: [1af4:1000] type 00 class 0x020000
> [    0.868059] pci 0000:00:03.0: reg 0x10: [io  0xc0c0-0xc0df]
> [    0.871391] pci 0000:00:03.0: reg 0x14: [mem 0xfc052000-0xfc052fff]
> [    0.888005] pci 0000:00:03.0: reg 0x30: [mem 0xfc000000-0xfc03ffff pref]
> [    0.888731] pci 0000:00:05.0: [1af4:1003] type 00 class 0x078000
> [    0.892193] pci 0000:00:05.0: reg 0x10: [io  0xc0e0-0xc0ff]
> [    0.895546] pci 0000:00:05.0: reg 0x14: [mem 0xfc053000-0xfc053fff]
> [    0.912719] pci 0000:00:06.0: [1af4:1001] type 00 class 0x010000
> [    0.916189] pci 0000:00:06.0: reg 0x10: [io  0xc000-0xc03f]
> [    0.919520] pci 0000:00:06.0: reg 0x14: [mem 0xfc054000-0xfc054fff]
> [    0.936539] pci 0000:00:07.0: [1af4:1002] type 00 class 0x00ff00
> [    0.938385] pci 0000:00:07.0: reg 0x10: [io  0xc100-0xc11f]
> [    0.949626] pci 0000:00:08.0: [1af4:1001] type 00 class 0x010000
> [    0.953128] pci 0000:00:08.0: reg 0x10: [io  0xc040-0xc07f]
> [    0.956405] pci 0000:00:08.0: reg 0x14: [mem 0xfc055000-0xfc055fff]
...
> [    1.002030] PCI: Using ACPI for IRQ routing
> [    1.005952] PCI: pci_cache_line_size set to 64 bytes
> [    1.006284] e820: reserve RAM buffer [mem 0x0009fc00-0x0009ffff]
> [    1.006288] e820: reserve RAM buffer [mem 0xbffe0000-0xbfffffff]
...
> [    1.074615] pci 0000:00:00.0: Limiting direct PCI/PCI transfers
> [    1.076868] pci 0000:00:01.0: PIIX3: Enabling Passive Release
> [    1.079052] pci 0000:00:01.0: Activating ISA DMA hang workarounds
> [    1.138382] ACPI: PCI Interrupt Link [LNKD] enabled at IRQ 11
> [    1.198305] pci 0000:00:02.0: Video device with shadowed ROM
> [    1.198406] PCI: CLS 0 bytes, default 64
> [    1.198516] Unpacking initramfs...
> [    1.577416] Freeing initrd memory: 15280K (ffff880036218000 -
> ffff880037104000)
> [    1.580520] PCI-DMA: Using software bounce buffering for IO (SWIOTLB)
> [    1.582902] software IO TLB [mem 0xbbfe0000-0xbffe0000] (64MB) mapped at
> [ffff8800bbfe0000-ffff8800bffdffff]
...
> [    1.726966] Freeing unused kernel memory: 1328K (ffffffff81b26000 -
> ffffffff81c72000)
> [    1.730137] Write protecting the kernel read-only data: 10240k
> [    1.732748] Freeing unused kernel memory: 248K (ffff8800015c2000 -
> ffff880001600000)
> [    1.740328] Freeing unused kernel memory: 1156K (ffff8800018df000 -
> ffff880001a00000)
> [    1.757185] x86/mm: Checked W+X mappings: passed, no W+X pages found.
> [    1.779460] systemd-udevd[65]: starting version 215
> [    1.782103] random: systemd-udevd urandom read with 2 bits of entropy
> available
> [    1.794611] virtio: module verification failed: signature and/or required
> key missing - tainting kernel

Tainted kernel.  How did this key verification failure of virtio module happen?

> [    1.907527] virtio-pci 0000:00:03.0: virtio_pci: leaving for legacy
> driver
...
> [    2.058430] virtio-pci 0000:00:05.0: virtio_pci: leaving for legacy
> driver
> [    2.077462] ata2.01: NODEV after polling detection
> [    2.077932] ata2.00: ATAPI: QEMU DVD-ROM, 2.1.2, max UDMA/100
> [    2.080945] ata2.00: configured for MWDMA2
> [    2.083672] scsi 1:0:0:0: CD-ROM            QEMU     QEMU DVD-ROM
> 2.1. PQ: 0 ANSI: 5
> [    2.117473] ACPI: PCI Interrupt Link [LNKB] enabled at IRQ 11
> [    2.119807] virtio-pci 0000:00:06.0: virtio_pci: leaving for legacy
> driver
> [    2.173758] virtio-pci 0000:00:07.0: virtio_pci: leaving for legacy
> driver
> [    2.229423] virtio-pci 0000:00:08.0: virtio_pci: leaving for legacy
> driver
> [    2.238680]  vda: vda1 vda2 < vda5 >
> [    2.246073]  vdb: vdb1
...
> [    3.283240] EXT4-fs (vda1): mounted filesystem with ordered data mode.
> Opts: (null)
> [    6.307634] systemd[1]: systemd 215 running in system mode. (+PAM +AUDIT
> +SELINUX +IMA +SYSVINIT +LIBCRYPTSETUP +GCRYPT +ACL +XZ -SECCOMP -APPARMOR)
> [    6.312583] systemd[1]: Detected virtualization 'kvm'.
> [    6.316890] systemd[1]: Detected architecture 'x86-64'.
> [    7.256189] systemd[1]: Inserted module 'autofs4'
> [    7.290052] systemd[1]: Set hostname to <storage>.
...
> [   16.879230] Adding 392188k swap on /dev/vda5.  Priority:-1 extents:1
> across:392188k FS
> [   16.930297] ppdev: user-space parallel port driver
> [   16.975515] [drm] Device Version 0.0
> [   16.979894] [drm] Compression level 0 log level 0
> [   16.985329] [drm] Currently using mode #0, list at 0x488
> [   16.991131] [drm] 12286 io pages at offset 0x1000000
> [   16.997187] [drm] 16777216 byte draw area at offset 0x0
> [   17.003456] [drm] RAM header offset: 0x3ffe000
> [   17.011343] [drm] rom modes offset 0x488 for 128 modes
> [   17.019792] [TTM] Zone  kernel: Available graphics memory: 1699632 kiB
> [   17.023421] [TTM] Initializing pool allocator
> [   17.025212] [TTM] Initializing DMA pool allocator
> [   17.027234] [drm] qxl: 16M of VRAM memory size
> [   17.033883] [drm] qxl: 63M of IO pages memory ready (VRAM domain)
> [   17.044373] [drm] qxl: 64M of Surface memory size
> [   17.061450] [drm] main mem slot 1 [f4000000,3ffe000]
> [   17.067808] [drm] surface mem slot 2 [f8000000,4000000]
> [   17.073988] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
> [   17.081851] [drm] No driver support for vblank timestamp query.
> [   17.093640] [drm] fb mappable at 0xF4000000, size 3145728
> [   17.102272] [drm] fb: depth 24, pitch 4096, width 1024, height 768
> [   17.111160] fbcon: qxldrmfb (fb0) is primary device
> [   17.216919] Console: switching to colour frame buffer device 128x48
> [   17.237108] qxl 0000:00:02.0: fb0: qxldrmfb frame buffer device
> [   17.256511] [drm] Initialized qxl 0.1.0 20120117 for 0000:00:02.0 on
> minor 0
> [   17.449130] SSE version of gcm_enc/dec engaged.
> [   17.560200] alg: No test for fips(ansi_cprng) (fips_ansi_cprng)
> [   18.527152] EXT4-fs (vda1): re-mounted. Opts: errors=remount-ro
> [   19.117665] EXT4-fs (vdb1): mounted filesystem with ordered data mode.
> Opts: (null)
> [   19.169568] systemd-journald[170]: Received request to flush runtime
> journal from PID 1
> [   27.503633] RPC: Registered named UNIX socket transport module.
> [   27.505537] RPC: Registered udp transport module.
> [   27.506947] RPC: Registered tcp transport module.
> [   27.508306] RPC: Registered tcp NFSv4.1 backchannel transport module.
> [   27.763407] FS-Cache: Loaded
> [   28.023532] FS-Cache: Netfs 'nfs' registered for caching
> [   28.126902] Installing knfsd (copyright (C) 1996 okir@monad.swb.de).
> [   30.202869] NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state
> recovery directory
> [   30.205586] NFSD: starting 90-second grace period (net ffffffff81ae7440)

I wonder if something could be going wrong with the caching?

> [   43.847366] random: nonblocking pool is initialized
> 10:09:04 rk@storage:~$ sudo reboot
> [sudo] password for rk:          Starting Synchronise Hardware Clock to
> System Clock...

Reboot occurs without incidence.

> [    0.000000] Linux version 4.5.0-0.bpo.2-amd64
> (debian-kernel@lists.debian.org) (gcc version 4.9.2 (Debian 4.9.2-10) ) #1
> SMP Debian 4.5.3-2~bpo8+1 (2016-05-13)
> [    0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-4.5.0-0.bpo.2-amd64
> root=UUID=4c43ff54-ff7c-470f-bf9c-391e4357a3b5 ro text console=tty0
> console=ttyS0,115200n8
> [    0.000000] x86/fpu: Legacy x87 FPU detected.
> [    0.000000] x86/fpu: Using 'lazy' FPU context switches.
> [    0.000000] e820: BIOS-provided physical RAM map:
> [    0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable
> [    0.000000] BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff]
> reserved
> [    0.000000] BIOS-e820: [mem 0x00000000000f0000-0x00000000000fffff]
> reserved
> [    0.000000] BIOS-e820: [mem 0x0000000000100000-0x00000000bffdffff] usable
> [    0.000000] BIOS-e820: [mem 0x00000000bffe0000-0x00000000bfffffff]
> reserved
> [    0.000000] BIOS-e820: [mem 0x00000000feffc000-0x00000000feffffff]
> reserved
> [    0.000000] BIOS-e820: [mem 0x00000000fffc0000-0x00000000ffffffff]
> reserved
> [    0.000000] BIOS-e820: [mem 0x0000000100000000-0x000000013fffffff] usable
> [    0.000000] NX (Execute Disable) protection: active
> [    0.000000] SMBIOS 2.8 present.
> [    0.000000] Hypervisor detected: KVM
> [    0.000000] e820: last_pfn = 0x140000 max_arch_pfn = 0x400000000
> [    0.000000] x86/PAT: Configuration [0-7]: WB  WC  UC- UC  WB  WC  UC- WT
> [    0.000000] e820: last_pfn = 0xbffe0 max_arch_pfn = 0x400000000
> [    0.000000] found SMP MP-table at [mem 0x000f0e70-0x000f0e7f] mapped at
> [ffff8800000f0e70]
> [    0.000000] RAMDISK: [mem 0x36218000-0x37103fff]
> [    0.000000] ACPI: Early table checksum verification disabled
> [    0.000000] ACPI: RSDP 0x00000000000F0C30 000014 (v00 BOCHS )
> [    0.000000] ACPI: RSDT 0x00000000BFFE1927 000034 (v01 BOCHS  BXPCRSDT
> 00000001 BXPC 00000001)
> [    0.000000] ACPI: FACP 0x00000000BFFE0B37 000074 (v01 BOCHS  BXPCFACP
> 00000001 BXPC 00000001)
> [    0.000000] ACPI: DSDT 0x00000000BFFE0040 000AF7 (v01 BOCHS  BXPCDSDT
> 00000001 BXPC 00000001)
> [    0.000000] ACPI: FACS 0x00000000BFFE0000 000040
> [    0.000000] ACPI: SSDT 0x00000000BFFE0BAB 000CBC (v01 BOCHS  BXPCSSDT
> 00000001 BXPC 00000001)
> [    0.000000] ACPI: APIC 0x00000000BFFE1867 000088 (v01 BOCHS  BXPCAPIC
> 00000001 BXPC 00000001)
> [    0.000000] ACPI: HPET 0x00000000BFFE18EF 000038 (v01 BOCHS  BXPCHPET
> 00000001 BXPC 00000001)
> [    0.000000] No NUMA configuration found
> [    0.000000] Faking a node at [mem 0x0000000000000000-0x000000013fffffff]
> [    0.000000] NODE_DATA(0) allocated [mem 0x13fff8000-0x13fffcfff]
> [    0.000000] kvm-clock: Using msrs 4b564d01 and 4b564d00
> [    0.000000] kvm-clock: cpu 0, msr 1:3fff0001, primary cpu clock
> [    0.000000] kvm-clock: using sched offset of 146729586697 cycles
> [    0.000000] clocksource: kvm-clock: mask: 0xffffffffffffffff max_cycles:
> 0x1cd42e4dffb, max_idle_ns: 881590591483 ns
> [    0.000000] Zone ranges:
> [    0.000000]   DMA      [mem 0x0000000000001000-0x0000000000ffffff]
> [    0.000000]   DMA32    [mem 0x0000000001000000-0x00000000ffffffff]
> [    0.000000]   Normal   [mem 0x0000000100000000-0x000000013fffffff]
> [    0.000000]   Device   empty
> [    0.000000] Movable zone start for each node
> [    0.000000] Early memory node ranges
> [    0.000000]   node   0: [mem 0x0000000000001000-0x000000000009efff]
> [    0.000000]   node   0: [mem 0x0000000000100000-0x00000000bffdffff]
> [    0.000000]   node   0: [mem 0x0000000100000000-0x000000013fffffff]
> [    0.000000] Initmem setup node 0 [mem
> 0x0000000000001000-0x000000013fffffff]
> [    0.000000] ACPI: PM-Timer IO Port: 0x608
> [    0.000000] ACPI: LAPIC_NMI (acpi_id[0xff] dfl dfl lint[0x1])
> [    0.000000] IOAPIC[0]: apic_id 0, version 17, address 0xfec00000, GSI
> 0-23
> [    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
> [    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 5 global_irq 5 high level)
> [    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
> [    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 10 global_irq 10 high level)
> [    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 11 global_irq 11 high level)
> [    0.000000] Using ACPI (MADT) for SMP configuration information
> [    0.000000] ACPI: HPET id: 0x8086a201 base: 0xfed00000
> [    0.000000] smpboot: Allowing 3 CPUs, 1 hotplug CPUs
> [    0.000000] PM: Registered nosave memory: [mem 0x00000000-0x00000fff]
> [    0.000000] PM: Registered nosave memory: [mem 0x0009f000-0x0009ffff]
> [    0.000000] PM: Registered nosave memory: [mem 0x000a0000-0x000effff]
> [    0.000000] PM: Registered nosave memory: [mem 0x000f0000-0x000fffff]
> [    0.000000] PM: Registered nosave memory: [mem 0xbffe0000-0xbfffffff]
> [    0.000000] PM: Registered nosave memory: [mem 0xc0000000-0xfeffbfff]
> [    0.000000] PM: Registered nosave memory: [mem 0xfeffc000-0xfeffffff]
> [    0.000000] PM: Registered nosave memory: [mem 0xff000000-0xfffbffff]
> [    0.000000] PM: Registered nosave memory: [mem 0xfffc0000-0xffffffff]
> [    0.000000] e820: [mem 0xc0000000-0xfeffbfff] available for PCI devices
> [    0.000000] Booting paravirtualized kernel on KVM
...
> [    0.394597] Disabling memory control group subsystem
> [    0.396967] mce: CPU supports 10 MCE banks
> [    0.399140] Last level iTLB entries: 4KB 0, 2MB 0, 4MB 0
> [    0.401494] Last level dTLB entries: 4KB 0, 2MB 0, 4MB 0, 1GB 0
> [    0.406375] Freeing SMP alternatives memory: 24K (ffffffff81c72000 -
> ffffffff81c78000)
> [    0.418321] ftrace: allocating 24125 entries in 95 pages
> [    0.457345] x2apic enabled
> [    0.459100] Switched APIC routing to physical x2apic.
> [    0.462668] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
> [    0.465464] smpboot: CPU0: Intel Westmere E56xx/L56xx/X56xx (Nehalem-C)
> (family: 0x6, model: 0x2c, stepping: 0x1)
> [    0.470914] Performance Events: unsupported p6 CPU model 44 no PMU
> driver, software events only.
> [    0.476347] x86: Booting SMP configuration:
> [    0.478497] .... node  #0, CPUs:      #1
> [    0.480639] kvm-clock: cpu 1, msr 1:3fff0041, secondary cpu clock
> [    0.502666] x86: Booted up 1 node, 2 CPUs
> [    0.502671] KVM setup async PF for cpu 1
> [    0.502677] kvm-stealtime: cpu 1, msr 13fc8dac0
> [    0.507806] smpboot: Total of 2 processors activated (9599.99 BogoMIPS)
> [    0.510868] devtmpfs: initialized
> [    0.518372] clocksource: jiffies: mask: 0xffffffff max_cycles:
> 0xffffffff, max_idle_ns: 7645041785100000 ns
> [    0.524142] pinctrl core: initialized pinctrl subsystem
> [    0.526703] NET: Registered protocol family 16
> [    0.540700] cpuidle: using governor ladder
> [    0.552812] cpuidle: using governor menu
> [    0.556962] ACPI: bus type PCI registered
> [    0.558677] acpiphp: ACPI Hot Plug PCI Controller Driver version: 0.5
> [    0.561517] PCI: Using configuration type 1 for base access
> [    0.573698] HugeTLB registered 2 MB page size, pre-allocated 0 pages
> [    0.576879] ACPI: Added _OSI(Module Device)
> [    0.578688] ACPI: Added _OSI(Processor Device)
> [    0.580477] ACPI: Added _OSI(3.0 _SCP Extensions)
> [    0.582375] ACPI: Added _OSI(Processor Aggregator Device)
> [    0.586767] ACPI: Interpreter enabled
> [    0.588803] ACPI: (supports S0 S3 S4 S5)
> [    0.590465] ACPI: Using IOAPIC for interrupt routing
> [    0.592460] PCI: Using host bridge windows from ACPI; if necessary, use
> "pci=nocrs" and report a bug
> [    0.602440] ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
> [    0.604858] acpi PNP0A03:00: _OSC: OS supports [ASPM ClockPM Segments
> MSI]
> [    0.607424] acpi PNP0A03:00: _OSC failed (AE_NOT_FOUND); disabling ASPM
> [    0.609869] acpi PNP0A03:00: fail to add MMCONFIG information, can't
> access extended PCI configuration space under this bridge.
...
> [    0.671142] PCI host bridge to bus 0000:00
> [    0.672788] pci_bus 0000:00: root bus resource [io  0x0000-0x0cf7 window]
> [    0.675268] pci_bus 0000:00: root bus resource [io  0x0d00-0xadff window]
> [    0.677716] pci_bus 0000:00: root bus resource [io  0xae0f-0xaeff window]
> [    0.710977] pci_bus 0000:00: root bus resource [io  0xaf20-0xafdf window]
> [    0.714923] pci_bus 0000:00: root bus resource [io  0xafe4-0xffff window]
> [    0.717386] pci_bus 0000:00: root bus resource [mem 0x000a0000-0x000bffff
> window]
> [    0.720559] pci_bus 0000:00: root bus resource [mem 0xc0000000-0xfebfffff
> window]
> [    0.724019] pci_bus 0000:00: root bus resource [bus 00-ff]
> [    0.741297] pci 0000:00:01.1: legacy IDE quirk: reg 0x10: [io
> 0x01f0-0x01f7]
> [    0.745610] pci 0000:00:01.1: legacy IDE quirk: reg 0x14: [io  0x03f6]
> [    0.748001] pci 0000:00:01.1: legacy IDE quirk: reg 0x18: [io
> 0x0170-0x0177]
> [    0.750605] pci 0000:00:01.1: legacy IDE quirk: reg 0x1c: [io  0x0376]
> [    0.768689] pci 0000:00:01.3: quirk: [io  0x0600-0x063f] claimed by PIIX4
> ACPI
> [    0.771568] pci 0000:00:01.3: quirk: [io  0x0700-0x070f] claimed by PIIX4
> SMB
...
> [    1.545736] audit: initializing netlink subsys (disabled)
> [    1.547861] audit: type=2000 audit(1464422965.231:1): initialized
> [    1.551111] Initialise system trusted keyring
> [    1.553534] zbud: loaded
> [    1.556386] Key type asymmetric registered
> [    1.558168] Asymmetric key parser 'x509' registered
...
> [    1.576271] GHES: HEST is not enabled!
...
> [    1.664633] registered taskstats version 1
> [    1.666395] Loading compiled-in X.509 certificates
> [    1.668874] Loaded X.509 cert 'Debian Project: Ben Hutchings:
> 008a018dca80932630'

Ok, given the key loaded this time, I expect that this time the kernel
won't be tainted.

> [    1.747969] scsi_mod: module verification failed: signature and/or
> required key missing - tainting kernel

Nope :-(

> [    1.834626] virtio-pci 0000:00:03.0: virtio_pci: leaving for legacy
> driver
...
> [    1.984186] virtio-pci 0000:00:05.0: virtio_pci: leaving for legacy
...
> [    2.046832] virtio-pci 0000:00:06.0: virtio_pci: leaving for legacy
> driver
> [    2.100847] virtio-pci 0000:00:07.0: virtio_pci: leaving for legacy
> driver
> [    2.156786] virtio-pci 0000:00:08.0: virtio_pci: leaving for legacy
> driver
> [    2.167493]  vda: vda1 vda2 < vda5 >
> [    2.172983] sr 1:0:0:0: [sr0] scsi3-mmc drive: 4x/4x cd/rw xa/form2 tray
> [    2.175548] cdrom: Uniform CD-ROM driver Revision: 3.20
> [    2.183142] sr 1:0:0:0: Attached scsi generic sg0 type 5
...
> Begin: Will now check root file system ... fsck from util-linux 2.25.2
> [/sbin/fsck.ext4 (1) -- /dev/vda1] fsck.ext4 -a -C0 /dev/vda1 /dev/vda1:
> clean, 52725/499712 files, 643179/1998336 blocks
> done.
> [    2.410152] EXT4-fs (vda1): mounted filesystem with ordered data mode.
> Opts: (null)
> done.

So far so good (except the that possible scsi_mod issue).

...
> [    2.513511] systemd[1]: Inserted module 'autofs4'
> [    2.518369] systemd[1]: Set hostname to <storage>.
...
> [    2.865064] systemd[1]: Starting Create list of required static device
> nodes for the current kernel...
> [    2.872307] systemd-cgroups[151]: segfault at 15 ip 000056202b46e157 sp
> 00007ffeca612668 error 6          in
> systemd-cgroups-agent[56202b466000+36000]
> [    2.878828] systemd-cgroups[153]: segfault at 15 ip 0000560f54ea1157 sp
> 00007ffee458dc18 error 6Starting Create list of required static device
> nodes...rrent kernel...
> [    2.879974] systemd-cgroups[154]: segfault at 15 ip 0000559debe67157 sp
> 00007ffca4f34658 error 6 in systemd-cgroups-agent[559debe5f000+36000]
> [    2.888278] systemd[1]: Mounting Huge Pages File System...

/\ Ok, this is where the real badness starts to happen /\

> [    2.891561]  in systemd-cgroups-agent[560f54e99000+36000]
>          Mounting Huge Pages File System...
> [    2.897734] systemd[1]: Mounting Debug File System...
>          Mounting Debug File System...[    2.903794] systemd-cgroups[157]:
> segfault at 15 ip 000055b0f134a157 sp 00007ffd6d0a99c8 error 6
>
> [    2.907862] systemd[1]: Starting Slices.
> [    2.908269] systemd-cgroups[158]: segfault at 15 ip 0000557f3c3dd157 sp
> 00007ffe98f97808 error 6 in systemd-cgroups-agent[557f3c3d5000+36000]
> [    2.908367]  in systemd-cgroups-agent[55b0f1342000+36000]
> [    2.912432] systemd-cgroups[161]: segfault at 15 ip 000055f1d8b02157 sp
> 00007fffbb01b0b8 error 6 in systemd-cgroups-agent[55f1d8afa000+36000]
> [    2.913231] systemd-cgroups[159]: segfault at 15 ip 0000561c7b9d3157 sp
> 00007ffcade23e58 error 6 in systemd-cgroups-agent[561c7b9cb000+36000]
> [  OK  ] Reached target Slices.

And continues, but "OK".

>          Starting Journal Service...
> [  OK  ] Started Journal Service.
> [    3.026109] systemd[1]: Started Journal Service.
> [  OK  ] Started udev Coldplug all Devices.
> [  OK  ] Started Create Static Device Nodes in /dev.
> [  OK  ] Started Apply Kernel Variables.
> [    3.042805] systemd[1]: systemd-journald.service has no holdoff time,
> scheduling restart.
...
> [    3.161542] systemd[1]: Stopping Journal Service...

Is this normal?

> [    3.162513] drm: Unknown symbol  (err 0)

Is this normal?

> [    3.162525] BUG: unable to handle kernel paging request at
> ffffc900934f6a88
> [    3.162531] IP: [<ffffffff813121e4>] strcmp+0x4/0x20
> [    3.162533] PGD 13b091067 PUD 0 [    3.162535] Oops: 0000 [#1] SMP [
> 3.162553] Modules linked in: button(E) autofs4(E) ext4(E) crc16(E)
> mbcache(E) jbd2(E) sg(E) sr_mod(E) cdrom(E) virtio_blk(E) virtio_net(E)
> ata_generic(E) uhci_hcd(E) ehci_hcd(E) crc32c_intel(E) ata_piix(E)
> psmouse(E) libata(E) floppy(E) virtio_pci(E) usbcore(E) virtio_ring(E)
> virtio(E) usb_common(E) scsi_mod(E)
> [    3.162557] CPU: 1 PID: 182 Comm: systemd-udevd Tainted: G            E
> 4.5.0-0.bpo.2-amd64 #1 Debian 4.5.3-2~bpo8+1
> [    3.162558] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
> 1.7.5-20140531_083030-gandalf 04/01/2014
> [    3.162559] task: ffff880036cbae40 ti: ffff8800bb048000 task.ti:
> ffff8800bb048000
> [    3.162562] RIP: 0010:[<ffffffff813121e4>]  [<ffffffff813121e4>]
> strcmp+0x4/0x20
> [    3.162564] RSP: 0018:ffff8800bb04bbc0  EFLAGS: 00010282
> [    3.162564] RAX: 0000000000000fb3 RBX: 00000000000007d9 RCX:
> 0000000000000010
> [    3.162565] RDX: 0000000000000fb3 RSI: ffffffff818cdbbd RDI:
> ffffc900934f6a89
> [    3.162566] RBP: ffffffff810f5b30 R08: ffffffff810f5b30 R09:
> 000000000000001c
> [    3.162567] R10: 0000000000000163 R11: 0000000000016ee8 R12:
> 0000000000000010
> [    3.162568] R13: 0000000000000fb3 R14: 0000000000000000 R15:
> ffffffff8189abe0
> [    3.162570] FS:  00007fc4cd1ac880(0000) GS:ffff88013fc80000(0000)
> knlGS:0000000000000000
> [    3.162571] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [    3.162572] CR2: ffffc900934f6a88 CR3: 0000000036c0d000 CR4:
> 00000000001006e0
> [    3.162576] Stack:
> [    3.162578]  ffffffff8131e106 ffffc900934f6a88 ffffffff81892e50
> ffff8800bb04bcd0
> [    3.162580]  0000000000000000 ffffffff816102c0 ffff8800bb04bcd0
> ffff8800bb04bf1c
> [    3.162582]  0000000000000000 ffffffff810f5c99 ffffffff816102c0
> ffff8800bb04bd28

\//\ And the bad bad badness /\\/

> [    3.162583] Call Trace:
> [    3.162588]  [<ffffffff8131e106>] ? bsearch+0x56/0x90
> [    3.162593]  [<ffffffff810f5c99>] ? find_symbol_in_section+0x39/0xc0
> [    3.162595]  [<ffffffff810f5c60>] ? mod_find_symname+0x70/0x70
> [    3.162597]  [<ffffffff810f6579>] ? each_symbol_section+0x39/0x170
> [    3.162599]  [<ffffffff810c9da8>] ? log_store+0x1a8/0x280
> [    3.162602]  [<ffffffff810bdd15>] ? down_trylock+0x25/0x30
> [    3.162604]  [<ffffffff810ca5c5>] ? console_trylock+0x15/0x70
> [    3.162606]  [<ffffffff810f6782>] ? find_symbol+0x42/0xb0
> [    3.162609]  [<ffffffff810f7c72>] ? resolve_symbol.isra.51+0x62/0x100
> [    3.162611]  [<ffffffff810f98f8>] ? load_module+0x12e8/0x2660
> [    3.162613]  [<ffffffff810f6ee2>] ?

This makes me wonder if your ram is bad, of if your VM image is corrupt.

> copy_module_from_fd.isra.54+0x102/0x150
> [    3.162616]  [<ffffffff810fae5e>] ? SYSC_finit_module+0x8e/0xc0
> [    3.162620]  [<ffffffff815ba2b6>] ? system_call_fast_compare_end+0xc/0x6b
> [    3.162650] Code: 74 09 48 83 c2 01 80 3a 00 75 f7 48 83 c6 01 0f b6 4e
> ff 48 83 c2 01 84 c9 88 4a ff 75 ed f3 c3 0f 1f 80 00 00 00 00 48 83 c7 01
> <0f> b6 47 ff 48 83 c6 01 3a 46 ff 75 07 84 c0 75 eb 31 c0 c3 19 [
> 3.162653] RIP  [<ffffffff813121e4>] strcmp+0x4/0x20
> [    3.162653]  RSP <ffff8800bb04bbc0>
> [    3.162654] CR2: ffffc900934f6a88
> [    3.162656] ---[ end trace e198ffde7e9d057e ]---
...
> [    3.458670] systemd[1]: systemd-journald.service start request repeated
> too quickly, refusing to start.
> [FAILED] Failed to start Journal Service.
> See 'systemctl status systemd-journald.service' for details.

This could be informative. "systemctl status systemd-journald.service"

> [    3.466849] systemd[1]: Failed to start Journal Service.
> [DEPEND] Dependency failed for Trigger Flushing of Journal to...istent
> Storage.
> [    3.474414] systemd[1]: Dependency failed for Trigger Flushing of Journal
> to Persistent Storage.

Wasn't the journal disabled earlier?

> [    3.477932] systemd[1]: Unit systemd-journald.service entered failed
> state.
> [    3.480700] systemd[1]: Starting Journal Service...
>          Starting Journal Service...
> [    3.484531] systemd[1]: systemd-journald.service start request repeated
> too quickly, refusing to start.
> [FAILED] Failed to start Journal Service.
> See 'systemctl status systemd-journald.service' for details.
> [    3.492809] systemd[1]: Failed to start Journal Service.
> [    3.494986] systemd[1]: Unit systemd-journald.socket entered failed
> state.

We're in a loop.

> [    3.495017] systemd-udevd[173]: worker [177] terminated by signal 11
> (Segmentation fault)
> [    3.495025] systemd-udevd[173]: worker [177] failed while handling
> '/devices/pci0000:00/0000:00:00.0'
> [    3.495071] systemd-udevd[173]: worker [178] terminated by signal 11
> (Segmentation fault)
> [    3.495075] systemd-udevd[173]: worker [178] failed while handling
> '/devices/pci0000:00/0000:00:01.0'
> [    3.495099] systemd-udevd[173]: worker [179] terminated by signal 11
> (Segmentation fault)
> [    3.495104] systemd-udevd[173]: worker [179] failed while handling
> '/devices/pci0000:00/0000:00:01.1'
> [    3.495128] systemd-udevd[173]: worker [180] terminated by signal 11
> (Segmentation fault)
> [    3.495132] systemd-udevd[173]: worker [180] failed while handling
> '/devices/pci0000:00/0000:00:01.2'
> [    3.495155] systemd-udevd[173]: worker [181] terminated by signal 11
> (Segmentation fault)
> [    3.495159] systemd-udevd[173]: worker [181] failed while handling
> '/devices/pci0000:00/0000:00:01.3'
> [    3.495182] systemd-udevd[173]: worker [182] terminated by signal 9
> (Killed)
> [    3.495186] systemd-udevd[173]: worker [182] failed while handling
> '/devices/pci0000:00/0000:00:02.0'

A bunch of pci-related errors.  Are you using virtio and/or pcie
passthrough?  I wonder if there are complementary errors in the dmesg
of the host kernel?

> [    3.580773] systemd-udevd[227]: ctx=0x55ec92d517d0
> path=/lib/modules/4.5.0-0.bpo.2-amd64/kernel/drivers/virtio/virtio_ring.ko
> error=No such file or directory
> [    3.589032] Module len 50008 truncated

This makes me think your VM image is corrupted.

> [    3.594201] systemd-udevd[184]: 'cdrom_id --lock-media /dev/sr0' [229]
> terminated by signal 4 (Illegal instruction)
> [    3.596203] systemd-udevd[186]: ctx=0x55ec92d517d0
> path=/lib/modules/4.5.0-0.bpo.2-amd64/kernel/drivers/virtio/virtio_ring.ko
> error=No such file or directory

Well that's odd...again, I suspect VM image corruption.

> [    3.612753] EXT4-fs error (device vda1): ext4_lookup:1602: inode #275136:
> comm systemd-udevd: deleted inode referenced: 275177
> [    3.616385] systemd[1]: Starting Remount Root and Kernel File Systems...
>          Starting Remount Root and Kernel File Systems...
> [    3.648381] EXT4-fs error (device vda1): ext4_lookup:1602: inode #275136:
> comm systemd-udevd: deleted inode referenced: 275177
> [    3.657519] EXT4-fs (vda1): warning: mounting fs with errors, running
> e2fsck is recommended
> [    3.673001] EXT4-fs error (device vda1): ext4_lookup:1602: inode #275136:
> comm systemd-udevd: deleted inode referenced: 275195
> [  OK  ] Found device /dev/ttyS0.
> [    3.686088] systemd[1]: Found device /dev/ttyS0.
> [    3.715352] EXT4-fs (vda1): re-mounted. Opts: errors=remount-ro
> [    3.717080] Aborting journal on device vda1-8.
...
> /lib/systemd/systemd-random-seed: Exec format error

My gut feeling is your VM images are corrupted, and/or that you have
bad ram.  I also wonder if it could be a virtio and/or pcie
passthrough issue if either are enabled...

Could you please tell me a bit about your btrfs topology and which
features you've enabled?  Also, with what version of btrfs-progs did
you create the volume?  Have you ever run a btrfs-scrub, btrfs-check,
or btrfs-balance on the host? (running which kernel, with which
version of btrfs-progs...)  If you haven't yet, please don't run
either until we discuss this some more.  Was your btrfs volume created
with btrfs-convert?  Are you running the host's btrfs volume on top of
LVM, MD, and/or in combination with a caching layer?

In addition to answering all of these questions, could you please
trigger this again and send the host dmesg?

Finally, because you're testing btrfs, you have recent backups, and
backups from before this error manifested, right?

Cheers!
Nicholas


Reply to: