[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#804857: linux: New feature: enable CONFIG_NO_HZ_FULL and CONFIG_RCU_NOCB_CPU/CONFIG_RCU_NOCB_CPU_NONE



Hi Frederic,

On Sat, Nov 13, 2021 at 12:27:12AM +0100, Frederic Weisbecker wrote:
> On Thu, Nov 04, 2021 at 10:05:02PM +0100, Henning Schild wrote:
> > Am Sat, 30 Oct 2021 16:04:35 +0200
> > schrieb Salvatore Bonaccorso <carnil@debian.org>:
> > 
> > > Control: tags -1 + moreinfo
> > > 
> > > On Wed, Oct 27, 2021 at 10:16:56AM +0200, Georg Müller wrote:
> > > > > But for other configurations it is worse:
> > > > > 
> > > > > config NO_HZ_FULL
> > > > >         bool "Full dynticks system (tickless)"
> > > > > ...
> > > > >          This is implemented at the expense of some overhead in
> > > > > user <-> kernel transitions: syscalls, exceptions and interrupts.
> > > > > Even when it's dynamically off.
> > > > > 
> > > > >          Say N.
> > > > >   
> > > > 
> > > > 
> > > > Upstream commit 176b8906 changed the description regarding
> > > > NO_HZ_FULL:
> > > > 
> > > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=176b8906c399a170886ea4bad5b24763c6713d61
> > > > 
> > > >   
> > > > > By default, without passing the nohz_full parameter, this behaves
> > > > > just like NO_HZ_IDLE.
> > > > >
> > > > > If you're a distro say Y.  
> > > 
> > > While this is changed, and distros encouraged to select it, selecting
> > > this would enable both CONFIG_VIRT_CPU_ACCOUNTING_GEN and
> > > CONFIG_RCU_NOCB_CPU.
> > > 
> > > For CONFIG_VIRT_CPU_ACCOUNTING_GEN
> > > 
> > >           Select this option to enable task and CPU time accounting
> > > on full dynticks systems. This accounting is implemented by watching
> > > every kernel-user boundaries using the context tracking subsystem.
> > >           The accounting is thus performed at the expense of some
> > > significant overhead.
> > > 
> > >           For now this is only useful if you are working on the full
> > >           dynticks subsystem development.
> > > 
> > >           If unsure, say N.
> > > 
> > > which indicates some significant overhead.
> > 
> > I can not answer that from the back of my head. Would have to dig as
> > well. Might get back in about two weeks if nobody else finds an answer.
> > 
> > But i took the liberty to include Frederic into Cc, the author of the
> > "distro reassure" patch.
> > 
> > Not sure such a change would be allowed for bullseye (5.10) and if the
> > answer for 5.10 would be another than for i.e. 5.15
> > 
> > Reading what it is, maybe it can in fact be decoupled from NO_HZ_FULL.
> > Which would mean an upstream patch and backporting (if preempt would do
> > that, but could be considered a "performance bug" i guess)
> > 
> > > And for CONFIG_RCU_NOCB_CPU
> > > 
> > >           Use this option to reduce OS jitter for aggressive HPC or
> > >           real-time workloads.  It can also be used to offload RCU
> > >           callback invocation to energy-efficient CPUs in
> > > battery-powered asymmetric multiprocessors.  The price of this
> > > reduced jitter is that the overhead of call_rcu() increases and that
> > > some workloads will incur significant increases in context-switch
> > >           rates.
> > > 
> > >           This option offloads callback invocation from the set of
> > > CPUs specified at boot time by the rcu_nocbs parameter.  For each
> > >           such CPU, a kthread ("rcuox/N") will be created to invoke
> > >           callbacks, where the "N" is the CPU being offloaded, and
> > > where the "x" is "p" for RCU-preempt (PREEMPTION kernels) and "s" for
> > >           RCU-sched (!PREEMPTION kernels).  Nothing prevents this
> > > kthread from running on the specified CPUs, but (1) the kthreads may
> > > be preempted between each callback, and (2) affinity or cgroups can
> > >           be used to force the kthreads to run on whatever set of
> > > CPUs is desired.
> > > 
> > >           Say Y here if you need reduced OS jitter, despite added
> > > overhead. Say N here if you are unsure.
> > > 
> > > Adding as well overhead.
> > > 
> > > Is this still to be considered true?
> > 
> > probably but only for people that actively choose to use it and only
> > for the CPUs they choose. "rcu_nocbs" cmdline param, if not set
> > everything will be as it was.
> > I already indicated that in the commit message of my MR:
> > https://salsa.debian.org/kernel-team/linux/-/merge_requests/385
> 
> Ok so the past traditional combo for a distro is:
> 
>    CONFIG_NO_HZ_IDLE=y
>    # CONFIG_RCU_NOCB_CPU is not set
>    CONFIG_TICK_CPU_ACCOUNTING=y
> 
> Then nohz_full support has been introduced which allows userspace
> tasks to run without being annoyed by tick interrupts. But ths feature
> is for extreme workloads. So we arranged for this support not to add
> additional overhead when it is not used.
> 
> This means that
> 
>    CONFIG_NO_HZ_FULL=y
>    
> which also automatically selects:
> 
>    CONFIG_RCU_NOCB_CPU=y
>    CONFIG_VIRT_CPU_ACCOUNTING_GEN=y
> 
> are not expected to bring more overhead than their traditional
> counterparts, unless kernel boot parameters such as "nohz_full="
> or "rcu_nocbs=" are passed.
> 
> So you can safely enable CONFIG_NO_HZ_FULL=y. I guess the only corner
> case is when you optimize your kernel for size and you are sure you
> won't have any user of nohz_full for your kernel, but I suspect some
> debian users, like me for example, might be interested in that feature.

Thank you. I pushed the change for the next upload:

https://salsa.debian.org/kernel-team/linux/-/commit/f6aad27f05c007d6f30b34ff77bc7ea47844f117

> I should clarify the help text for CONFIG_VIRT_CPU_ACCOUNTING_GEN that
> is definitely out of date.

Ack, thank you.

Regards,
Salvatore


Reply to: