[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: kernel configs in Debian



On Mon, Apr 26, 2021 at 9:43 AM Ryutaroh Matsumoto
<ryutaroh@ict.e.titech.ac.jp> wrote:
>
> For (ARM) SBCs with limited computational power, stripping out
> unused features from the kernel sometimes improves the performance,
> depending on usage.
>
> For my use case of packet filtering by RPi4B,
>
> CONFIG_PARAVIRT=n
> CONFIG_DEBUG_KERNEL=n
>
> each of the above increases the throughput of the packet filtering router
> by about 100Mbps, from the baseline 600Mbps by linux-image-rt-arm64 5.10.
> The above options cannot be disabled in Debian kernel package
> for its wider use cases. Rebuild of linux-image-rt-arm64 was done by
> https://github.com/emojifreak/debian-rpi-image-script/blob/main/build-debian-raspi-kernel.sh

Interesting analysis. I would have expected neither of those two options to
have a measurable effect on network throughput, so it is possible that
these are hitting a bug somewhere that leads to bad performance.

The only effect that CONFIG_PARAVIRT is supposed to have is the steal
time accounting. Incidentally that has just changed to a static_call
in linux-5.13
with commit a0e2bf7cb700 ("x86/paravirt: Switch time pvops functions to
use static_call()") on all architectures, so maybe that also addresses the
problem.

CONFIG_DEBUG_KERNEL by itself does not do anything, but instead it
controls a number of other configuration options. You should be able to
see which options changed by comparing the config file before and after
turning this off.

Generally I think at least CONFIG_DEBUG_INFO should be enabled in
a distro kernel in order to analyse bug reports better, but this is not
supposed to change executable code. What other options are disabled
when you turn this off?

Also, do you see the same performance difference with the non-rt kernel?
Most people would not run the -rt kernel because of the inherent
performance overhead, and it's not clear whether the slowdown you
see is the result of a combination of CONFIG_PREEMPT_RT with some
other option, or if this is something that hurts normal users as well.

> On the other hand, I am wondering why the following options are currently
> disabled by Debian arm64 kernel 5.10 package:
>
> CONFIG_CLEANCACHE:
> Cleancache can be thought of as a page-granularity victim cache for
> clean pages that the kernel's pageframe replacement algorithm (PFRA)
> would like to keep around, but can't since there isn't enough
> memory. So when the PFRA "evicts" a page, it first attempts to use
> cleancache code to put the data contained in that page into
> "transcendent memory", memory that is not directly accessible or
> addressable by the kernel and is of unknown and possibly time-varying
> size. And when a cleancache-enabled filesystem wishes to access a page
> in a file on disk, it first checks cleancache to see if it already
> contains it; if it does, the page is copied into the kernel and a disk
> access is avoided. When a transcendent memory driver is available
> (such as zcache or Xen transcendent memory), a significant I/O
> reduction may be achieved. When none is available, all cleancache
> calls are reduced to a single pointer-compare-against-NULL resulting
> in a negligible performance hit.
>
> If unsure, say Y to enable cleancache
>
> This is enabled by other distros.:
> https://hlandau.github.io/kconfigreport/option/CONFIG_CLEANCACHE.xhtml

This seems like a useful thing to enable.

> CONFIG_ZONE_DEVICE:
> Device memory hotplug support allows for establishing pmem, or other
> device driver discovered memory regions, in the memmap. This allows
> pfn_to_page() lookups of otherwise "device-physical" addresses which
> is needed for using a DAX mapping in an O_DIRECT operation, among
> other things.
>
> If FS_DAX is enabled, then say Y.
>
> (FS_DAX is enabled in Debian arm64 kernel 5.10 package)

This should probably be an architecture-independent setting.
It does sound useful to only enable either both ZONE_DEVICE and
FS_DAX or not at all. I'm not aware of any arm64 hardware supporting
nvdimm or similar technology that needs these, but there is probably
someone who has it, if only in a lab.

> CONFIG_IRQ_TIME_ACCOUNTING:
> Select this option to enable fine granularity task irq time
> accounting. This is done by reading a timestamp on each transitions
> between softirq and hardirq state, so there can be a small performance
> impact.
>
> (My observation suggests CONFIG_PARAVIRT=y having much higher overhead.)
>
> If in doubt, say N here.
>
> The above CONFIG_IRQ_TIME_ACCOUNTING enables %hi in "top".
> See also "Is Your Linux Version Hiding Interrupt CPU Usage From You?"
> https://tanelpoder.com/posts/linux-hiding-interrupt-cpu-usage/

Indeed, reading the hardware clock on arm64 is usually cheap compared
to other architectures, so enabling this seems reasonable.

       Arnd


Reply to: