[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#807352: Subject: linux-image-3.16.0-4-amd64: kernel BUG at smpboot.c:134



Package: src:linux
Version: 3.16.7-ckt11-1+deb8u5
Severity: important
Tags: upstream

Dear Maintainer,

We have identified a failure in this kernel version to boot with low
probability on Google Compute Engine, and have traced the issue back to a bug
in upstream that has since been fixed.  Please integrate the following upstream
commit.

https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=dd9d3843755da95f63dd3a376f62b3e45c011210

sched: Fix cpu_active_mask/cpu_online_mask race
There is a race condition in SMP bootup code, which may result
in

    WARNING: CPU: 0 PID: 1 at kernel/workqueue.c:4418
    workqueue_cpu_up_callback()
or
    kernel BUG at kernel/smpboot.c:135!

It can be triggered with a bit of luck in Linux guests running
on busy hosts.

 CPU0 CPUn
 ==== ====

 _cpu_up()
   __cpu_up()
        start_secondary()
          set_cpu_online()
     cpumask_set_cpu(cpu,
         to_cpumask(cpu_online_bits));
   cpu_notify(CPU_ONLINE)
     <do stuff, see below>
     cpumask_set_cpu(cpu,
         to_cpumask(cpu_active_bits));

During the various CPU_ONLINE callbacks CPUn is online but not
active. Several things can go wrong at that point, depending on
the scheduling of tasks on CPU0.

Variant 1:

  cpu_notify(CPU_ONLINE)
    workqueue_cpu_up_callback()
      rebind_workers()
        set_cpus_allowed_ptr()

  This call fails because it requires an active CPU; rebind_workers()
  ends with a warning:

    WARNING: CPU: 0 PID: 1 at kernel/workqueue.c:4418
    workqueue_cpu_up_callback()

Variant 2:

  cpu_notify(CPU_ONLINE)
    smpboot_thread_call()
      smpboot_unpark_threads()
       ..
        __kthread_unpark()
          __kthread_bind()
          wake_up_state()
           ..
            select_task_rq()
              select_fallback_rq()

  The ->wake_cpu of the unparked thread is not allowed, making a call
  to select_fallback_rq() necessary. Then, select_fallback_rq() cannot
  find an allowed, active CPU and promptly resets the allowed CPUs, so
  that the task in question ends up on CPU0.

  When those unparked tasks are eventually executed, they run
  immediately into a BUG:

    kernel BUG at kernel/smpboot.c:135!

Just changing the order in which the online/active bits are set
(and adding some memory barriers), would solve the two issues
above. However, it would change the order of operations back to
the one before commit 6acbfb96976f ("sched: Fix hotplug vs.
set_cpus_allowed_ptr()"), thus, reintroducing that particular
problem.

Going further back into history, we have at least the following
commits touching this topic:
- commit 2baab4e90495 ("sched: Fix select_fallback_rq() vs
cpu_active/cpu_online")
- commit 5fbd036b552f ("sched: Cleanup cpu_active madness")

Together, these give us the following non-working solutions:

  - secondary CPU sets active before online, because active is assumed to
    be a subset of online;

  - secondary CPU sets online before active, because the primary CPU
    assumes that an online CPU is also active;

  - secondary CPU sets online and waits for primary CPU to set active,
    because it might deadlock.

Commit 875ebe940d77 ("powerpc/smp: Wait until secondaries are
active & online") introduces an arch-specific solution to this
arch-independent problem.

Now, go for a more general solution without explicit waiting and
simply set active twice: once on the secondary CPU after online
was set and once on the primary CPU after online was seen.

set_cpus_allowed_ptr()")



-- Package-specific info:
** Version:
Linux version 3.16.0-4-amd64 (debian-kernel@lists.debian.org) (gcc
version 4.8.4 (Debian 4.8.4-1) ) #1 SMP Debian 3.16.7-ckt11-1+deb8u5
(2015-10-09)

** Command line:
BOOT_IMAGE=/boot/vmlinuz-3.16.0-4-amd64
root=UUID=f5d73494-1cf2-4811-8e2d-67884d4bd6e7 ro
console=ttyS0,38400n8 elevator=noop console=hvc0

** Not tainted

** Kernel log:
[    0.562395] sd 0:0:1:0: [sda] 20971520 512-byte logical blocks:
(10.7 GB/10.0 GiB)
[    0.563436] sd 0:0:1:0: [sda] 4096-byte physical blocks
[    0.564355] sd 0:0:1:0: [sda] Write Protect is off
[    0.565007] sd 0:0:1:0: [sda] Mode Sense: 1f 00 00 08
[    0.565054] sd 0:0:1:0: [sda] Write cache: enabled, read cache:
enabled, doesn't support DPO or FUA
[    0.567526]  sda: sda1
[    0.568469] sd 0:0:1:0: [sda] Attached SCSI disk
[    0.569459] sd 0:0:1:0: Attached scsi generic sg0 type 0
[    0.680208] input: AT Translated Set 2 keyboard as
/devices/platform/i8042/serio0/input/input0
[    0.699765] EXT4-fs (sda1): mounted filesystem with ordered data
mode. Opts: (null)
[    0.797732] systemd[1]: systemd 215 running in system mode. (+PAM
+AUDIT +SELINUX +IMA +SYSVINIT +LIBCRYPTSETUP +GCRYPT +ACL +XZ
-SECCOMP -APPARMOR)
[    0.799627] systemd[1]: Detected virtualization 'kvm'.
[    0.800379] systemd[1]: Detected architecture 'x86-64'.
[    0.851257] systemd[1]: Inserted module 'autofs4'
[    0.852140] systemd[1]: No hostname configured.
[    0.852805] systemd[1]: Set hostname to <localhost>.
[    0.970228] systemd[1]: Cannot add dependency job for unit
dbus.socket, ignoring: Unit dbus.socket failed to load: No such file
or directory.
[    0.972036] systemd[1]: Cannot add dependency job for unit
display-manager.service, ignoring: Unit display-manager.service failed
to load: No such file or directory.
[    0.974247] systemd[1]: Expecting device dev-ttyS0.device...
[    0.980127] systemd[1]: Starting Forward Password Requests to Wall
Directory Watch.
[    0.981489] systemd[1]: Started Forward Password Requests to Wall
Directory Watch.
[    0.982613] systemd[1]: Starting Remote File Systems (Pre).
[    0.992114] systemd[1]: Reached target Remote File Systems (Pre).
[    0.993050] systemd[1]: Starting Encrypted Volumes.
[    1.000145] systemd[1]: Reached target Encrypted Volumes.
[    1.001026] systemd[1]: Starting Dispatch Password Requests to
Console Directory Watch.
[    1.002331] systemd[1]: Started Dispatch Password Requests to
Console Directory Watch.
[    1.003665] systemd[1]: Starting Arbitrary Executable File Formats
File System Automount Point.
[    1.012116] systemd[1]: Set up automount Arbitrary Executable File
Formats File System Automount Point.
[    1.013825] systemd[1]: Starting Swap.
[    1.020134] systemd[1]: Reached target Swap.
[    1.020845] systemd[1]: Starting Root Slice.
[    1.028123] systemd[1]: Created slice Root Slice.
[    1.028915] systemd[1]: Starting User and Session Slice.
[    1.036148] systemd[1]: Created slice User and Session Slice.
[    1.037231] systemd[1]: Starting /dev/initctl Compatibility Named Pipe.
[    1.044192] systemd[1]: Listening on /dev/initctl Compatibility Named Pipe.
[    1.045272] systemd[1]: Starting Delayed Shutdown Socket.
[    1.052106] systemd[1]: Listening on Delayed Shutdown Socket.
[    1.053042] systemd[1]: Starting Journal Socket (/dev/log).
[    1.060127] systemd[1]: Listening on Journal Socket (/dev/log).
[    1.060981] systemd[1]: Starting udev Kernel Socket.
[    1.068187] systemd[1]: Listening on udev Kernel Socket.
[    1.069127] systemd[1]: Starting udev Control Socket.
[    1.076125] systemd[1]: Listening on udev Control Socket.
[    1.077170] systemd[1]: Starting Journal Socket.
[    1.084126] systemd[1]: Listening on Journal Socket.
[    1.084928] systemd[1]: Starting System Slice.
[    1.092156] systemd[1]: Created slice System Slice.
[    1.093251] systemd[1]: Started File System Check on Root Device.
[    1.094295] systemd[1]: Starting system-getty.slice.
[    1.104163] systemd[1]: Created slice system-getty.slice.
[    1.105061] systemd[1]: Starting system-serial\x2dgetty.slice.
[    1.112120] systemd[1]: Created slice system-serial\x2dgetty.slice.
[    1.113196] systemd[1]: Starting Increase datagram queue length...
[    1.121761] systemd[1]: Started Set Up Additional Binary Formats.
[    1.123056] systemd[1]: Mounting Debug File System...
[    1.132379] systemd[1]: Starting udev Coldplug all Devices...
[    1.140365] systemd[1]: Starting Create list of required static
device nodes for the current kernel...
[    1.148339] systemd[1]: Mounting POSIX Message Queue File System...
[    1.154357] systemd[1]: Starting Load Kernel Modules...
[    1.164514] systemd[1]: Mounting Huge Pages File System...
[    1.172345] systemd[1]: Starting Slices.
[    1.180137] systemd[1]: Reached target Slices.
[    1.180781] systemd[1]: Starting Remount Root and Kernel File Systems...
[    1.192873] EXT4-fs (sda1): re-mounted. Opts: (null)
[    1.196122] systemd[1]: Mounted Huge Pages File System.
[    1.204210] systemd[1]: Mounted POSIX Message Queue File System.
[    1.212153] systemd[1]: Mounted Debug File System.
[    1.220194] systemd[1]: Started Increase datagram queue length.
[    1.228268] systemd[1]: Started Create list of required static
device nodes for the current kernel.
[    1.236178] systemd[1]: Started Load Kernel Modules.
[    1.244109] systemd[1]: Started Remount Root and Kernel File Systems.
[    1.252125] systemd[1]: Started udev Coldplug all Devices.
[    1.260988] systemd[1]: Starting Various fixups to make systemd
work better on Debian...
[    1.268466] systemd[1]: Starting Load/Save Random Seed...
[    1.276347] systemd[1]: Starting Apply Kernel Variables...
[    1.284367] systemd[1]: Mounted FUSE Control File System.
[    1.285266] systemd[1]: Mounted Configuration File System.
[    1.286066] systemd[1]: Starting Create Static Device Nodes in /dev...
[    1.292330] systemd[1]: Starting Syslog Socket.
[    1.300152] systemd[1]: Listening on Syslog Socket.
[    1.301174] systemd[1]: Starting Journal Service...
[    1.316160] systemd[1]: Started Journal Service.
[    1.365177] systemd-udevd[175]: starting version 215
[    1.368087] tsc: Refined TSC clocksource calibration: 2300.003 MHz
[    1.431978] systemd-journald[173]: Received request to flush
runtime journal from PID 1
[    1.443420] input: Power Button as
/devices/LNXSYSTM:00/LNXPWRBN:00/input/input2
[    1.444571] ACPI: Power Button [PWRF]
[    1.445370] input: Sleep Button as
/devices/LNXSYSTM:00/LNXSLPBN:00/input/input3
[    1.447470] ACPI: Sleep Button [SLPF]
[    1.452538] piix4_smbus 0000:00:01.3: SMBus base address
uninitialized - upgrade BIOS or use force_addr=0xaddr
[    1.490217] AVX2 version of gcm_enc/dec engaged.
[    1.503138] ppdev: user-space parallel port driver
[    1.505023] alg: No test for __gcm-aes-aesni (__driver-gcm-aes-aesni)
[    1.509085] alg: No test for crc32 (crc32-pclmul)
[    1.511110] intel_rapl: no valid rapl domains found in package 0
[    1.664655] psmouse serio1: alps: Unknown ALPS touchpad: E7=10 00
64, EC=10 00 64
[    2.098149] input: ImPS/2 Generic Wheel Mouse as
/devices/platform/i8042/serio1/input/input4
[    2.324606] random: nonblocking pool is initialized

** Model information
sys_vendor: Google
product_name: Google
product_version:
chassis_vendor: Google
chassis_version:
bios_vendor: Google
bios_version: Google

** Loaded modules:
crc32_pclmul
ghash_clmulni_intel
ppdev
aesni_intel
aes_x86_64
lrw
gf128mul
glue_helper
ablk_helper
cryptd
evdev
processor
thermal_sys
parport_pc
psmouse
parport
i2c_piix4
serio_raw
i2c_core
button
autofs4
ext4
crc16
mbcache
jbd2
sg
sd_mod
crc_t10dif
crct10dif_generic
virtio_net
virtio_scsi
scsi_mod
crct10dif_pclmul
crct10dif_common
crc32c_intel
virtio_pci
virtio_ring
virtio

** PCI devices:
not available

** USB devices:
not available


-- System Information:
Debian Release: 8.2
  APT prefers stable-updates
  APT policy: (500, 'stable-updates'), (500, 'stable')
Architecture: amd64 (x86_64)

Kernel: Linux 3.16.0-4-amd64 (SMP w/1 CPU core)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash
Init: systemd (via /run/systemd/system)

Versions of packages linux-image-3.16.0-4-amd64 depends on:
ii  debconf [debconf-2.0]                   1.5.56
ii  initramfs-tools [linux-initramfs-tool]  0.120
ii  kmod                                    18-3
ii  linux-base                              3.5

Versions of packages linux-image-3.16.0-4-amd64 recommends:
pn  firmware-linux-free  <none>
pn  irqbalance           <none>

Versions of packages linux-image-3.16.0-4-amd64 suggests:
pn  debian-kernel-handbook  <none>
ii  grub-pc                 2.02~beta2-22
pn  linux-doc-3.16          <none>

Versions of packages linux-image-3.16.0-4-amd64 is related to:
pn  firmware-atheros        <none>
pn  firmware-bnx2           <none>
pn  firmware-bnx2x          <none>
pn  firmware-brcm80211      <none>
pn  firmware-intelwimax     <none>
pn  firmware-ipw2x00        <none>
pn  firmware-ivtv           <none>
pn  firmware-iwlwifi        <none>
pn  firmware-libertas       <none>
pn  firmware-linux          <none>
pn  firmware-linux-nonfree  <none>
pn  firmware-myricom        <none>
pn  firmware-netxen         <none>
pn  firmware-qlogic         <none>
pn  firmware-ralink         <none>
pn  firmware-realtek        <none>
pn  xen-hypervisor          <none>

-- debconf information:
  linux-image-3.16.0-4-amd64/postinst/depmod-error-initrd-3.16.0-4-amd64: false
  linux-image-3.16.0-4-amd64/postinst/mips-initrd-3.16.0-4-amd64:
  linux-image-3.16.0-4-amd64/prerm/removing-running-kernel-3.16.0-4-amd64: true


Reply to: