Bug#719948: marked as done (Kernel BUG in cgroup freezer when repeatedly freezing/thawing a group)

To: jmm@debian.org
Subject: Bug#719948: marked as done (Kernel BUG in cgroup freezer when repeatedly freezing/thawing a group)
From: "Debian Bug Tracking System" <owner@bugs.debian.org>
Date: Wed, 28 Apr 2021 16:21:03 +0000
Message-id: <[🔎] handler.719948.D719948.161962668425826.ackdone@bugs.debian.org>
Reply-to: 719948@bugs.debian.org
References: <E1lbmsh-001LM3-9I@hullmann.westfalen.local> <520F0391.1050703@tigertech.com>

Your message dated Wed, 28 Apr 2021 18:18:02 +0200
with message-id <E1lbmsh-001LM3-9I@hullmann.westfalen.local>
and subject line Closing this bug
has caused the Debian Bug report #719948,
regarding Kernel BUG in cgroup freezer when repeatedly freezing/thawing a group
to be marked as done.

This means that you claim that the problem has been dealt with.
If this is not the case it is now your responsibility to reopen the
Bug report if necessary, and/or fix the problem forthwith.

(NB: If you are a system administrator and have no idea what this
message is talking about, this may indicate a serious mail system
misconfiguration somewhere. Please contact owner@bugs.debian.org
immediately.)


-- 
719948: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=719948
Debian Bug Tracking System
Contact owner@bugs.debian.org with problems

--- Begin Message ---

To: submit@bugs.debian.org
Subject: Kernel BUG in cgroup freezer when repeatedly freezing/thawing a group
From: Robert L Mathews <rob@tigertech.com>
Date: Fri, 16 Aug 2013 22:01:05 -0700
Message-id: <520F0391.1050703@tigertech.com>

Package: src:linux
Version: 3.2.46-1
Severity: important

Dear Debian Linux Kernel Maintainers,

If I create a cgroup freezer container on an SMP machine and repeatedly
freeze/thaw it in a loop, the kernel freezes with a BUG.

To reproduce, create a cgroups freezer container with a single process
in it on an SMP machine with wheezy standard kernel 3.2.46-1:

 mkdir /dev/cgroups-freezer
 mount -t cgroup -o freezer freezer /dev/cgroups-freezer
 mkdir /dev/cgroups-freezer/crashtest
 cd /dev/cgroups-freezer/crashtest
 sleep 3600 &
 echo $! > tasks

Then run this ugly perl one-liner from within the same "crashtest"
directory:

 perl -e 'while (1) { open FILE, ">freezer.state" or die; print FILE
"FROZEN" or die; close FILE or die; open FILE, ">freezer.state" or die;
print FILE "THAWED" or die; close FILE or die; };'

On my test machines, the following BUG reproducibly happens in less than
a second, and the machine locks up:

[ 2703.254372] ------------[ cut here ]------------
[ 2703.254530] kernel BUG at
/build/linux-dJLVDt/linux-3.2.46/kernel/cgroup_freezer.c:241!
[ 2703.254769] invalid opcode: 0000 [#1] SMP
[ 2703.254917] Modules linked in: netconsole nfnetlink_log nfnetlink
configfs nfsd nfs nfs_acl auth_rpcgss fscache lockd sunrpc loop
snd_intel8x0 snd_ac97_codec snd_pcm snd_page_alloc snd_timer snd
soundcore ac97_bus ac battery processor parport_pc parport power_supply
thermal_sys button psmouse serio_raw pcspkr joydev evdev i2c_piix4
i2c_core vboxguest(O) ext4 crc16 jbd2 mbcache usbhid hid sg sr_mod
sd_mod cdrom crc_t10dif ata_generic ata_piix ohci_hcd ehci_hcd ahci
libahci usbcore e1000 libata scsi_mod usb_common [last unloaded: netconsole]
[ 2703.256018]
[ 2703.256018] Pid: 2835, comm: perl Tainted: G           O
3.2.0-4-686-pae #1 Debian 3.2.46-1 innotek GmbH VirtualBox/VirtualBox
[ 2703.256018] EIP: 0060:[<c106dc6f>] EFLAGS: 00010002 CPU: 0
[ 2703.256018] EIP is at update_if_frozen.isra.1+0x47/0x73
[ 2703.256018] EAX: 00000000 EBX: 00000001 ECX: df2ef4c0 EDX: dd265ee4
[ 2703.256018] ESI: 00000001 EDI: dd6a6350 EBP: 00000000 ESP: dd265edc
[ 2703.256018]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
[ 2703.256018] Process perl (pid: 2835, ti=dd264000 task=df248ee0
task.ti=dd264000)
[ 2703.256018] Stack:
[ 2703.256018]  dd265ee4 df2ef4c0 00000000 de2b1284 df2ef4c0 dd6a6340
dd265f28 00000002
[ 2703.256018]  c106dd5a c12c271a c1165b6c c106dd01 c13e892c dd265f28
0916b860 c106b49d
[ 2703.256018]  00000006 df2ef4c0 00001000 5a4f5246 00004e45 520eb4b9
2fb866f6 520eb4bf
[ 2703.256018] Call Trace:
[ 2703.256018]  [<c106dd5a>] ? freezer_write+0x59/0x13c
[ 2703.256018]  [<c12c271a>] ? _cond_resched+0x5/0x18
[ 2703.256018]  [<c1165b6c>] ? _copy_from_user+0x28/0x47
[ 2703.256018]  [<c106dd01>] ? freezer_read+0x66/0x66
[ 2703.256018]  [<c106b49d>] ? cgroup_file_write+0x18f/0x1e1
[ 2703.256018]  [<c10ccddf>] ? rw_verify_area+0xc6/0xe7
[ 2703.256018]  [<c106b30e>] ? cgroup_file_open+0x87/0x87
[ 2703.256018]  [<c10cd07f>] ? vfs_write+0x83/0xd4
[ 2703.256018]  [<c10cd23f>] ? sys_write+0x3d/0x61
[ 2703.256018]  [<c12c7f5f>] ? sysenter_do_call+0x12/0x28
[ 2703.256018] Code: e8 2b f6 ff ff eb 0b e8 2d ff ff ff 46 3c 01 83 db
ff 8b 44 24 04 8d 54 24 08 e8 fe f6 ff ff 85 c0 75 e4 85 ed 75 06 85 db
74 17 <0f> 0b 4d 75 0c 39 f3 75 0e c7 07 02 00 00 00 eb 06 39 f3 74 02
[ 2703.256018] EIP: [<c106dc6f>] update_if_frozen.isra.1+0x47/0x73
SS:ESP 0068:dd265edc
[ 2703.256018] ---[ end trace 29c9f3fc0f436abe ]---

I have duplicated this on wheezy with this kernel:

 Linux [hostname] 3.2.0-4-686-pae #1 SMP Debian 3.2.46-1 i686 GNU/Linux

And on squeeze with the same kernel backported, but on different amd64
(non-virtual) hardware:

 Linux [hostname] 3.2.0-0.bpo.4-amd64 #1 SMP Debian 3.2.46-1~bpo60+1
x86_64 GNU/Linux

In my testing, the BUG only happens on SMP machines, and not on single
CPU machines.

Also, if you include a slight delay before the freeze, the problem
doesn't happen reproducibly, at least to me:

 perl -e 'while (1) { select (undef, undef, undef, 0.01); open FILE,
">freezer.state" or die; print FILE "FROZEN" or die; close FILE or die;
open FILE, ">freezer.state" or die; print FILE "THAWED" or die; close
FILE or die; };'  # does not BUG due to the select() delay

Looking at line 241 of kernel/cgroup_freezer.c in version 3.2.46,
something is clearly wrong: the code believes the state of the group is
CGROUP_THAWED, and yet it contains a frozen task. The fact that it's
both timing- and SMP- dependent suggests a race condition of some kind.

-- System Information:
Debian Release: 7.1
  APT prefers stable
  APT policy: (500, 'stable')
Architecture: i386 (i686)

Kernel: Linux 3.2.0-4-686-pae (SMP w/2 CPU cores)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash

-- 
Robert L Mathews, Tiger Technologies

--- End Message ---

--- Begin Message ---

To: 719948-done@bugs.debian.org

Subject: Closing this bug

From: jmm@debian.org

Date: Wed, 28 Apr 2021 18:18:02 +0200

Message-id: <E1lbmsh-001LM3-9I@hullmann.westfalen.local>
This bug was filed for a very old kernel. If you can reproduce it with
- the current version in unstable/testing
- the latest kernel from buster.backports
please reopen the bug, see https://www.debian.org/Bugs/server-control
--- End Message ---

Reply to:

Prev by Date: Bug#719847: marked as done (linux-image-3.10-2-amd64: hang and timout populating /dev no console after boot)
Next by Date: Bug#720626: marked as done (linux-image-3.10-2-amd64: Logitech Unified Receiver support broken)
Previous by thread: Bug#719847: marked as done (linux-image-3.10-2-amd64: hang and timout populating /dev no console after boot)
Next by thread: Bug#720626: marked as done (linux-image-3.10-2-amd64: Logitech Unified Receiver support broken)
Index(es):
- Date
- Thread