Re: rcu_sched self-detected stall on CPU

Hi Ben,

Yes the NFS server is mounting itself as well. I do see this problem occur on all servers though.

Should it not? Any further clues?

Thank you for looking into this,

Rumen Telbizov

On Tue, Oct 6, 2015 at 1:11 PM, Ben Hutchings <ben@decadent.org.uk> wrote:

On Tue, 2015-10-06 at 11:41 -0700, Rumen Telbizov wrote:
[...]
> > Setup:
> > A cluster of 5 machines. First machine exports a drive over NFSv4
> > to the rest acting as clients. Processing takes place on the every
> > machine (including the server) and output data is written back on
> > the NFS shared drive. Running kernel 3.16.7-ckt11-1+deb8u4, also
> > tried the 4.1.6 backport - the same problem occurs there too.

Is the first machine NFS-mounting from itself?

> > Hardware:
> > X10DRT-PT, 256GB RAM, 12 x E5-2620, 2xS3710s SSDs mdraid1. Latest
> > BIOS firmware.
> >
> > I was wondering if _raw_spin_lock in the stack trace and the fact
> > that the CPUs hit 100% might be related?

_raw_spin_lock is a common function for synchronisation. It doesn't
sleep (except in RT-kernels), so in case of a deadlock you will see
100% CPU usage rather than tasks in D state.

Ben.

--
Ben Hutchings
All the simple programs have been written, and all the good names taken.

Rumen Telbizov

Unix Systems Administrator

Reply to:

Follow-Ups:

Re: rcu_sched self-detected stall on CPU
- From: Ben Hutchings <ben@decadent.org.uk>