--- Begin Message ---
- To: submit@bugs.debian.org
- Subject: Kernel BUG in cgroup freezer when repeatedly freezing/thawing a group
- From: Robert L Mathews <rob@tigertech.com>
- Date: Fri, 16 Aug 2013 22:01:05 -0700
- Message-id: <520F0391.1050703@tigertech.com>
Package: src:linux
Version: 3.2.46-1
Severity: important
Dear Debian Linux Kernel Maintainers,
If I create a cgroup freezer container on an SMP machine and repeatedly
freeze/thaw it in a loop, the kernel freezes with a BUG.
To reproduce, create a cgroups freezer container with a single process
in it on an SMP machine with wheezy standard kernel 3.2.46-1:
mkdir /dev/cgroups-freezer
mount -t cgroup -o freezer freezer /dev/cgroups-freezer
mkdir /dev/cgroups-freezer/crashtest
cd /dev/cgroups-freezer/crashtest
sleep 3600 &
echo $! > tasks
Then run this ugly perl one-liner from within the same "crashtest"
directory:
perl -e 'while (1) { open FILE, ">freezer.state" or die; print FILE
"FROZEN" or die; close FILE or die; open FILE, ">freezer.state" or die;
print FILE "THAWED" or die; close FILE or die; };'
On my test machines, the following BUG reproducibly happens in less than
a second, and the machine locks up:
[ 2703.254372] ------------[ cut here ]------------
[ 2703.254530] kernel BUG at
/build/linux-dJLVDt/linux-3.2.46/kernel/cgroup_freezer.c:241!
[ 2703.254769] invalid opcode: 0000 [#1] SMP
[ 2703.254917] Modules linked in: netconsole nfnetlink_log nfnetlink
configfs nfsd nfs nfs_acl auth_rpcgss fscache lockd sunrpc loop
snd_intel8x0 snd_ac97_codec snd_pcm snd_page_alloc snd_timer snd
soundcore ac97_bus ac battery processor parport_pc parport power_supply
thermal_sys button psmouse serio_raw pcspkr joydev evdev i2c_piix4
i2c_core vboxguest(O) ext4 crc16 jbd2 mbcache usbhid hid sg sr_mod
sd_mod cdrom crc_t10dif ata_generic ata_piix ohci_hcd ehci_hcd ahci
libahci usbcore e1000 libata scsi_mod usb_common [last unloaded: netconsole]
[ 2703.256018]
[ 2703.256018] Pid: 2835, comm: perl Tainted: G O
3.2.0-4-686-pae #1 Debian 3.2.46-1 innotek GmbH VirtualBox/VirtualBox
[ 2703.256018] EIP: 0060:[<c106dc6f>] EFLAGS: 00010002 CPU: 0
[ 2703.256018] EIP is at update_if_frozen.isra.1+0x47/0x73
[ 2703.256018] EAX: 00000000 EBX: 00000001 ECX: df2ef4c0 EDX: dd265ee4
[ 2703.256018] ESI: 00000001 EDI: dd6a6350 EBP: 00000000 ESP: dd265edc
[ 2703.256018] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
[ 2703.256018] Process perl (pid: 2835, ti=dd264000 task=df248ee0
task.ti=dd264000)
[ 2703.256018] Stack:
[ 2703.256018] dd265ee4 df2ef4c0 00000000 de2b1284 df2ef4c0 dd6a6340
dd265f28 00000002
[ 2703.256018] c106dd5a c12c271a c1165b6c c106dd01 c13e892c dd265f28
0916b860 c106b49d
[ 2703.256018] 00000006 df2ef4c0 00001000 5a4f5246 00004e45 520eb4b9
2fb866f6 520eb4bf
[ 2703.256018] Call Trace:
[ 2703.256018] [<c106dd5a>] ? freezer_write+0x59/0x13c
[ 2703.256018] [<c12c271a>] ? _cond_resched+0x5/0x18
[ 2703.256018] [<c1165b6c>] ? _copy_from_user+0x28/0x47
[ 2703.256018] [<c106dd01>] ? freezer_read+0x66/0x66
[ 2703.256018] [<c106b49d>] ? cgroup_file_write+0x18f/0x1e1
[ 2703.256018] [<c10ccddf>] ? rw_verify_area+0xc6/0xe7
[ 2703.256018] [<c106b30e>] ? cgroup_file_open+0x87/0x87
[ 2703.256018] [<c10cd07f>] ? vfs_write+0x83/0xd4
[ 2703.256018] [<c10cd23f>] ? sys_write+0x3d/0x61
[ 2703.256018] [<c12c7f5f>] ? sysenter_do_call+0x12/0x28
[ 2703.256018] Code: e8 2b f6 ff ff eb 0b e8 2d ff ff ff 46 3c 01 83 db
ff 8b 44 24 04 8d 54 24 08 e8 fe f6 ff ff 85 c0 75 e4 85 ed 75 06 85 db
74 17 <0f> 0b 4d 75 0c 39 f3 75 0e c7 07 02 00 00 00 eb 06 39 f3 74 02
[ 2703.256018] EIP: [<c106dc6f>] update_if_frozen.isra.1+0x47/0x73
SS:ESP 0068:dd265edc
[ 2703.256018] ---[ end trace 29c9f3fc0f436abe ]---
I have duplicated this on wheezy with this kernel:
Linux [hostname] 3.2.0-4-686-pae #1 SMP Debian 3.2.46-1 i686 GNU/Linux
And on squeeze with the same kernel backported, but on different amd64
(non-virtual) hardware:
Linux [hostname] 3.2.0-0.bpo.4-amd64 #1 SMP Debian 3.2.46-1~bpo60+1
x86_64 GNU/Linux
In my testing, the BUG only happens on SMP machines, and not on single
CPU machines.
Also, if you include a slight delay before the freeze, the problem
doesn't happen reproducibly, at least to me:
perl -e 'while (1) { select (undef, undef, undef, 0.01); open FILE,
">freezer.state" or die; print FILE "FROZEN" or die; close FILE or die;
open FILE, ">freezer.state" or die; print FILE "THAWED" or die; close
FILE or die; };' # does not BUG due to the select() delay
Looking at line 241 of kernel/cgroup_freezer.c in version 3.2.46,
something is clearly wrong: the code believes the state of the group is
CGROUP_THAWED, and yet it contains a frozen task. The fact that it's
both timing- and SMP- dependent suggests a race condition of some kind.
-- System Information:
Debian Release: 7.1
APT prefers stable
APT policy: (500, 'stable')
Architecture: i386 (i686)
Kernel: Linux 3.2.0-4-686-pae (SMP w/2 CPU cores)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash
--
Robert L Mathews, Tiger Technologies
--- End Message ---