Bug#657916: linux-source-2.6.32: ps time doubled then constant: missing lock for task_utime?
Package: linux-source-2.6.32
Version: 2.6.32-41
Severity: normal
On rare occasions, for some long-running processes, ps shows a too-large
and then constant CPU time. I only observed this for multi-threaded
processes compiled with -fopenmp .
On one occasion I seen:
$ ps u -p 14804
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
psz 14804 1599 0.0 61528 1356 ? RNl Jan13 71587:15 a.out
$ grep . /proc/14804/stat /proc/14804/task/*/stat
/proc/14804/stat:14804 (a.out) R 1 14804 14608 0 -1 4202496 624 0 0 0 427308277 2215294 0 0 36 16 8 0 35128884 63004672 339 4294967295 134512640 134539869 3214629248 3214614264 134522975 0 0 1 0 4294967295 0 0 17 2 0 0 0 0 0
/proc/14804/task/14804/stat:14804 (a.out) R 1 14804 14608 0 -1 4202496 588 0 0 0 26478404 333660 0 0 36 16 8 0 35128884 63004672 339 4294967295 134512640 134539869 3214629248 3214613280 3077747522 0 0 1 0 0 0 0 17 2 0 0 0 0 0
/proc/14804/task/14807/stat:14807 (a.out) R 1 14804 14608 0 -1 4202560 6 0 0 0 26703589 138033 0 0 20 0 8 0 35128884 63004672 339 4294967295 134512640 134539869 3214629248 3075048080 3077747646 0 0 1 0 0 0 0 -1 5 0 0 0 0 0
/proc/14804/task/14808/stat:14808 (a.out) R 1 14804 14608 0 -1 4202560 4 0 0 0 26802997 48697 0 0 20 0 8 0 35128884 63004672 339 4294967295 134512640 134539869 3214629248 3066655216 3077747646 0 0 1 0 0 0 0 -1 1 0 0 0 0 0
/proc/14804/task/14809/stat:14809 (a.out) R 1 14804 14608 0 -1 4202560 5 0 0 0 26756492 95248 0 0 20 0 8 0 35128884 63004672 339 4294967295 134512640 134539869 3214629248 3058262672 3077747646 0 0 1 0 0 0 0 -1 6 0 0 0 0 0
/proc/14804/task/14810/stat:14810 (a.out) R 1 14804 14608 0 -1 4202560 4 0 0 0 26689860 161611 0 0 20 0 8 0 35128884 63004672 339 4294967295 134512640 134539869 3214629248 3049869808 3077747646 0 0 1 0 0 0 0 -1 7 0 0 0 0 0
/proc/14804/task/14811/stat:14811 (a.out) R 1 14804 14608 0 -1 4202560 6 0 0 0 26705969 145689 0 0 20 0 8 0 35128884 63004672 339 4294967295 134512640 134539869 3214629248 3041477104 3077747646 0 0 1 0 0 0 0 -1 0 0 0 0 0 0
/proc/14804/task/14812/stat:14812 (a.out) R 1 14804 14608 0 -1 4202560 4 0 0 0 26729186 122435 0 0 20 0 8 0 35128884 63004672 339 4294967295 134512640 134539869 3214629248 3033084560 3077747646 0 0 1 0 0 0 0 -1 3 0 0 0 0 0
/proc/14804/task/14813/stat:14813 (a.out) R 1 14804 14608 0 -1 4202560 7 0 0 0 26789545 62169 0 0 20 0 8 0 35128884 63004672 339 4294967295 134512640 134539869 3214629248 3024691856 3077747646 0 0 1 0 0 0 0 -1 4 0 0 0 0 0
with TIME and %CPU reported by ps apparently doubled just before then;
from then on, TIME remained constant and %CPU slowly decreased. In that
state, command ps u -L -p 14804 showed sensible output. I did not
wait long enough to see whether TIME ever increased again.
I wonder if this issue is related to task_utime in kernel/sched.c
calculating and updating p->prev_utime without any locks, whereas
comments say that thread_group_times must be called with siglock held.
Thanks,
Paul Szabo psz@maths.usyd.edu.au http://www.maths.usyd.edu.au/u/psz/
School of Mathematics and Statistics University of Sydney Australia
-- System Information:
Debian Release: 6.0.4
APT prefers stable
APT policy: (500, 'stable')
Architecture: i386 (i686)
Kernel: Linux 2.6.32-pk05.09-svr (SMP w/8 CPU cores)
Locale: LANG=C, LC_CTYPE=C (charmap=ANSI_X3.4-1968)
Shell: /bin/sh linked to /bin/bash
Versions of packages linux-source-2.6.32 depends on:
ii binutils 2.20.1-16 The GNU assembler, linker and bina
ii bzip2 1.0.5-6+squeeze1 high-quality block-sorting file co
Versions of packages linux-source-2.6.32 recommends:
ii gcc 4:4.4.5-1 The GNU C compiler
ii libc6-dev [libc-dev] 2.11.3-2 Embedded GNU C Library: Developmen
ii make 3.81-8 An utility for Directing compilati
Versions of packages linux-source-2.6.32 suggests:
ii kernel-package 12.036+nmu1 A utility for building Linux kerne
ii libncurses5-dev [ncurses- 5.7+20100313-5 developer's libraries and docs for
pn libqt3-mt-dev <none> (no description available)
Reply to: