--- Begin Message ---
- To: Debian Bug Tracking System <submit@bugs.debian.org>
- Subject: noflushd: Noflushd causes flush- processes to eat all CPU
- From: Xavier Roche <xavier@debian.org>
- Date: Sun, 29 Aug 2010 19:53:24 +0200
- Message-id: <20100829175324.5559.98564.reportbug@localnet>
Package: noflushd
Version: 2.8-1
Severity: important
I think the problem might be still there, when some monitored disks are becoming automatically idle (or through "hdparm -S242").
Note that the given disks do not need to have pending write, apparently, for the problem to be reproducible.
I managed to reproduce the issue after a clean reboot (and after
removing some potentially new options from the grsecurity kernel - to be
sure that this was not a possible cause) on a fresh 2.6.34.4 kernel.
I started noflushd, and then waited for some time, and the problem appeared again. Monitored disks are all configured to go in idle after a while (using "hdparm -S242 /dev/.." at startup)
In this state, the noflushd daemon is still running (and not consumming
cpu), but flush-* process do:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
13604 root 20 0 0 0 0 R 48.1 0.0 8:19.70 flush-34:0
13605 root 20 0 0 0 0 R 48.1 0.0 5:42.76 flush-8:0
After a while, more flush- processes appears, and the load increases.
The noflushd demon appears to be still running (it is NOT stuck, even if
flush-* kernel jobs are stuck), and each 5 seconds attempt to do fsync's()
nanosleep({5, 0}, {5, 0}) = 0
time(NULL) = 1283100653
_llseek(5, 0, [0], SEEK_SET) = 0
read(5, " 3 64 hdb 98217 251654 278"..., 1024) = 1024
read(5, "0 0 0 0 0 0 0 0 0 0\n"..., 1024) = 20
read(5, ""..., 1024) = 0
time(NULL) = 1283100653
_llseek(3, 0, [0], SEEK_SET) = 0
read(3, "major minor #blocks name\n\n 3 "..., 1024) = 354
fsync(6) = 0
fsync(7) = 0
fsync(10) = 0
fsync(11) = 0
fsync(12) = 0
fsync(13) = 0
fsync(14) = 0
fsync(15) = 0
read(3, ""..., 1024) = 0
fsync(16) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigaction(SIGCHLD, NULL, {SIG_DFL}, 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
nanosleep({5, 0}, ..
(repeated endlessly - ie. it does not wait 60 seconds as it used to do
before)
(/proc/<pid-of-flush-processed>/wchan gives 0)
No i/o activity on disk, but load increasing as flush- process appears.
After touching the mounted directory corresponding to the idle disk to force a disk spinup (a "ls" will take several seconds until the disk is back to normal), the load goes back to zero, and the system sync stucked processes returns.
The noflushd process then goes back to a 60 second loop:
time(NULL) = 1283100976
_llseek(5, 0, [0], SEEK_SET) = 0
read(5, " 3 64 hdb 98222 251654 278"..., 1024) = 1024
read(5, "0 0 0 0 0 0 0 0 0 0\n"..., 1024) = 20
read(5, ""..., 1024) = 0
time(NULL) = 1283100976
_llseek(3, 0, [0], SEEK_SET) = 0
read(3, "major minor #blocks name\n\n 3 "..., 1024) = 354
fsync(6) = 0
fsync(7) = 0
fsync(10) = 0
fsync(11) = 0
fsync(12) = 0
fsync(13) = 0
fsync(14) = 0
fsync(15) = 0
read(3, ""..., 1024) = 0
fsync(16) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigaction(SIGCHLD, NULL, {SIG_DFL}, 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
nanosleep({5, 0},
{5, 0}) = 0
time(NULL) = 1283100981
_llseek(5, 0, [0], SEEK_SET) = 0
read(5, " 3 64 hdb 98222 251654 278"..., 1024) = 1024
read(5, "0 0 0 0 0 0 0 0 0 0\n"..., 1024) = 20
read(5, ""..., 1024) = 0
time(NULL) = 1283100981
time(NULL) = 1283100981
_llseek(3, 0, [0], SEEK_SET) = 0
read(3, "major minor #blocks name\n\n 3 "..., 1024) = 354
fsync(8) = 0
fsync(9) = 0
_llseek(4, 0, [0], SEEK_SET) = 0
write(4, "500\n"..., 4) = 4
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigaction(SIGCHLD, NULL, {SIG_DFL}, 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
nanosleep({60, 0},
At this time, I think that the suspend mode might be the root of all
evil ; I don't known how it can impact noflushd anyway. Setting up disks
to automatically enter in standby mode (hdparm -S242 /dev/hd${dev}) appears to be the cause.
Using noflushd 2.8-1 ; Linux kernel 2.6.34.4.
I'm available to do more tests if necessary.
-- System Information:
Debian Release: 5.0.5
APT prefers unstable
APT policy: (500, 'unstable'), (500, 'stable')
Architecture: i386 (i686)
Kernel: Linux 2.6.34.4-grsec (SMP w/1 CPU core)
Locale: LANG=fr_FR.UTF-8@euro, LC_CTYPE=fr_FR.UTF-8@euro (charmap=UTF-8)
Shell: /bin/sh linked to /bin/bash
Versions of packages noflushd depends on:
ii debconf [debconf-2.0] 1.5.24 Debian configuration management sy
ii ed 0.7-3 The classic unix line editor
ii libc6 2.11.2-2 Embedded GNU C Library: Shared lib
noflushd recommends no packages.
noflushd suggests no packages.
-- debconf information:
noflushd/expert: false
* noflushd/disks: /dev/hdb /dev/hde /dev/hdg
noflushd/params:
* noflushd/timeout: 60
--- End Message ---