Bug#670398: linux-image-2.6.32-5-amd64: SSH logins hang while hpet interrupts multiply on Intel Nehalem CPUs
On Thu, Apr 26, 2012 at 04:49:56AM +0100, Ben Hutchings wrote:
> On Wed, 2012-04-25 at 10:36 +0200, Sven Hoexter wrote:
Hi,
> > Searching through munin graphs we could narrow down the starting point of this issue
> > to the point when the hpet interrupts for one CPU core multiplied. Sometimes they
> > multiplied by six. Looking further we've found the Kernel [events/$x] in state D
> > where $x is the number of the CPU core which has the high number of hpet interrupts.
> >
> > When we started strace -f on the sshd master process everything works until you logout.
> > Then you'll again see the forked sshd process hanging in state D.
>
> This is strange, because D state means uninterruptible sleep (not
> handling signals). But perhaps the sshd process was repeatedly changing
> between uninterruptible and interruptible state.
Is it possible to gather such data? I guess grep'ing through ps output
is not the right tool here.
>From a system currently suffering from this issue:
ps aux|grep D
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 15 0.0 0.0 0 0 ? D Apr25 0:53 [events/0]
root 4162 0.0 0.0 0 0 ? Ds 08:33 0:00 [bash]
480 7875 0.0 0.0 0 0 ? Ds 09:28 0:00 [bash]
root 9407 0.0 0.0 76644 3392 ? Ds 09:49 0:00 sshd: root@pts/79
480 11310 0.0 0.0 8940 884 ? S 09:59 0:00 grep D
480 11765 0.0 0.0 0 0 ? Ds Apr25 0:00 [bash]
root 12803 0.0 0.0 76644 3392 ? Ds Apr25 0:00 sshd: root@pts/12
root 13762 0.0 0.0 76644 3392 ? Ds Apr25 0:00 sshd: root@pts/73
root 15111 0.0 0.0 0 0 ? Ds Apr25 0:00 [bash]
root 19361 0.0 0.0 0 0 ? Ds Apr25 0:00 [bash]
root 20966 0.0 0.0 0 0 ? Ds Apr25 0:00 [bash]
root 29323 0.0 0.0 0 0 ? Ds Apr25 0:00 [bash]
Sven
Reply to: