[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#804857: linux: New feature: enable CONFIG_NO_HZ_FULL and CONFIG_RCU_NOCB_CPU/CONFIG_RCU_NOCB_CPU_NONE



On Thu, Nov 04, 2021 at 10:05:02PM +0100, Henning Schild wrote:
> Am Sat, 30 Oct 2021 16:04:35 +0200
> schrieb Salvatore Bonaccorso <carnil@debian.org>:
> 
> > Control: tags -1 + moreinfo
> > 
> > On Wed, Oct 27, 2021 at 10:16:56AM +0200, Georg Müller wrote:
> > > > But for other configurations it is worse:
> > > > 
> > > > config NO_HZ_FULL
> > > >         bool "Full dynticks system (tickless)"
> > > > ...
> > > >          This is implemented at the expense of some overhead in
> > > > user <-> kernel transitions: syscalls, exceptions and interrupts.
> > > > Even when it's dynamically off.
> > > > 
> > > >          Say N.
> > > >   
> > > 
> > > 
> > > Upstream commit 176b8906 changed the description regarding
> > > NO_HZ_FULL:
> > > 
> > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=176b8906c399a170886ea4bad5b24763c6713d61
> > > 
> > >   
> > > > By default, without passing the nohz_full parameter, this behaves
> > > > just like NO_HZ_IDLE.
> > > >
> > > > If you're a distro say Y.  
> > 
> > While this is changed, and distros encouraged to select it, selecting
> > this would enable both CONFIG_VIRT_CPU_ACCOUNTING_GEN and
> > CONFIG_RCU_NOCB_CPU.
> > 
> > For CONFIG_VIRT_CPU_ACCOUNTING_GEN
> > 
> >           Select this option to enable task and CPU time accounting
> > on full dynticks systems. This accounting is implemented by watching
> > every kernel-user boundaries using the context tracking subsystem.
> >           The accounting is thus performed at the expense of some
> > significant overhead.
> > 
> >           For now this is only useful if you are working on the full
> >           dynticks subsystem development.
> > 
> >           If unsure, say N.
> > 
> > which indicates some significant overhead.
> 
> I can not answer that from the back of my head. Would have to dig as
> well. Might get back in about two weeks if nobody else finds an answer.
> 
> But i took the liberty to include Frederic into Cc, the author of the
> "distro reassure" patch.
> 
> Not sure such a change would be allowed for bullseye (5.10) and if the
> answer for 5.10 would be another than for i.e. 5.15
> 
> Reading what it is, maybe it can in fact be decoupled from NO_HZ_FULL.
> Which would mean an upstream patch and backporting (if preempt would do
> that, but could be considered a "performance bug" i guess)
> 
> > And for CONFIG_RCU_NOCB_CPU
> > 
> >           Use this option to reduce OS jitter for aggressive HPC or
> >           real-time workloads.  It can also be used to offload RCU
> >           callback invocation to energy-efficient CPUs in
> > battery-powered asymmetric multiprocessors.  The price of this
> > reduced jitter is that the overhead of call_rcu() increases and that
> > some workloads will incur significant increases in context-switch
> >           rates.
> > 
> >           This option offloads callback invocation from the set of
> > CPUs specified at boot time by the rcu_nocbs parameter.  For each
> >           such CPU, a kthread ("rcuox/N") will be created to invoke
> >           callbacks, where the "N" is the CPU being offloaded, and
> > where the "x" is "p" for RCU-preempt (PREEMPTION kernels) and "s" for
> >           RCU-sched (!PREEMPTION kernels).  Nothing prevents this
> > kthread from running on the specified CPUs, but (1) the kthreads may
> > be preempted between each callback, and (2) affinity or cgroups can
> >           be used to force the kthreads to run on whatever set of
> > CPUs is desired.
> > 
> >           Say Y here if you need reduced OS jitter, despite added
> > overhead. Say N here if you are unsure.
> > 
> > Adding as well overhead.
> > 
> > Is this still to be considered true?
> 
> probably but only for people that actively choose to use it and only
> for the CPUs they choose. "rcu_nocbs" cmdline param, if not set
> everything will be as it was.
> I already indicated that in the commit message of my MR:
> https://salsa.debian.org/kernel-team/linux/-/merge_requests/385

Ok so the past traditional combo for a distro is:

   CONFIG_NO_HZ_IDLE=y
   # CONFIG_RCU_NOCB_CPU is not set
   CONFIG_TICK_CPU_ACCOUNTING=y

Then nohz_full support has been introduced which allows userspace
tasks to run without being annoyed by tick interrupts. But ths feature
is for extreme workloads. So we arranged for this support not to add
additional overhead when it is not used.

This means that

   CONFIG_NO_HZ_FULL=y
   
which also automatically selects:

   CONFIG_RCU_NOCB_CPU=y
   CONFIG_VIRT_CPU_ACCOUNTING_GEN=y

are not expected to bring more overhead than their traditional
counterparts, unless kernel boot parameters such as "nohz_full="
or "rcu_nocbs=" are passed.

So you can safely enable CONFIG_NO_HZ_FULL=y. I guess the only corner
case is when you optimize your kernel for size and you are sure you
won't have any user of nohz_full for your kernel, but I suspect some
debian users, like me for example, might be interested in that feature.

I should clarify the help text for CONFIG_VIRT_CPU_ACCOUNTING_GEN that
is definitely out of date.

Thanks.


Reply to: