Re: rcu_sched self-detected stall on CPU

To: debian-kernel@lists.debian.org
Subject: Re: rcu_sched self-detected stall on CPU
From: Ben Hutchings <ben@decadent.org.uk>
Date: Tue, 06 Oct 2015 21:11:21 +0100
Message-id: <[🔎] 1444162281.2956.128.camel@decadent.org.uk>
In-reply-to: <[🔎] CAENR+_XG=bKqgL32pjhC=VkKC9TTQGX9WfgNpe7rJeot8Vn-Aw@mail.gmail.com>
References: <CAENR+_UB+ibEk4T01R1xWRD8OTaf9psNugCDbL01cja9HdEz3Q@mail.gmail.com> <[🔎] CAENR+_XG=bKqgL32pjhC=VkKC9TTQGX9WfgNpe7rJeot8Vn-Aw@mail.gmail.com>

On Tue, 2015-10-06 at 11:41 -0700, Rumen Telbizov wrote:
[...]
> > Setup:
> > A cluster of 5 machines. First machine exports a drive over NFSv4
> > to the rest acting as clients. Processing takes place on the every
> > machine (including the server) and output data is written back on
> > the NFS shared drive. Running kernel 3.16.7-ckt11-1+deb8u4, also
> > tried the 4.1.6 backport - the same problem occurs there too. 

Is the first machine NFS-mounting from itself?

> > Hardware:
> > X10DRT-PT, 256GB RAM, 12 x E5-2620, 2xS3710s SSDs mdraid1. Latest
> > BIOS firmware.
> > 
> > I was wondering if  _raw_spin_lock in the stack trace and the fact
> > that the CPUs hit 100% might be related?

_raw_spin_lock is a common function for synchronisation.  It doesn't
sleep (except in RT-kernels), so in case of a deadlock you will see
100% CPU usage rather than tasks in D state.

Ben.

-- 
Ben Hutchings
All the simple programs have been written, and all the good names taken.

Attachment: signature.asc
Description: This is a digitally signed message part

Reply to:

Follow-Ups:
- Re: rcu_sched self-detected stall on CPU
  - From: Rumen Telbizov <telbizov@gmail.com>

References:
- Re: rcu_sched self-detected stall on CPU
  - From: Rumen Telbizov <telbizov@gmail.com>

Prev by Date: linux_4.2.3-1_source.changes ACCEPTED into unstable
Next by Date: Re: rcu_sched self-detected stall on CPU
Previous by thread: Re: rcu_sched self-detected stall on CPU
Next by thread: Re: rcu_sched self-detected stall on CPU
Index(es):
- Date
- Thread