[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: kernel configs in Debian



Sorry for a bit late response.

> I would not expect any change in performance from omitting unused drivers.
> If turning off the other platforms has a performance impact, this could still
> mean that there is a serious performance regression where we do not
> expect it.

I do not know if you meant CONFIG_ARCH_* by "drivers".
Removal of all CONFIG_ARCH_* other than CONFIG_ARCH_BCM2835 disables
CONFIG_GENERIC_IRQ_MIGRATION=y
CONFIG_GENERIC_IRQ_CHIP=y
CONFIG_IRQ_FASTEOI_HIERARCHY_HANDLERS=y

CONFIG_NUMA=n & CONFIG_HOTPLUG_CPU=n  disable
CONFIG_HAVE_SETUP_PER_CPU_AREA
and CONFIG_NEED_PER_CPU_EMBED_FIRST_CHUNK,
and enable CONFIG_ARCH_FLATMEM_ENABLE.

Those changes could have some impact...

> The impact of CONFIG_DEBUG_PREEMPT is also higher than I expected
> here, it may be worth asking on the linux-rt-users list about what the
> expected cost on arm64 hardware is.

I believe they are very well aware of this, see
https://wiki.linuxfoundation.org/realtime/documentation/howto/applications/preemptrt_setup

There recommendation seems(?) CONFIG_DEBUG_PREEMPT=n
for better performance.

> Can you check whether there are any other differences in the .config
> file besides CONFIG_PARAVIRT that may cause the difference, and
> that you didn't mix up the results?

I believe no.
The reason of the difference may come from:
* The number of measurement is too few (2 times).
* Measured speed depends on the IPv6 network of ISP, which I cannot make
  constant.
The RPi4B is used for processing real network traffic and my family complains
if it is down for too long...

> I see you do a couple of things in this fragment. One of them is the
> CONFIG_BPF_JIT_ALWAYS_ON=y option that might result in
> a significant difference if you actually use BPF (otherwise it makes
> no difference).

I believe the measured speed depends on nftables, ipv4-ipv6 tunnel,
macvlan driver, Ethernet driver and the general network stack, not
including BPF.

My net if config is:
ip6tnl1 (tunnel) binds to myve1 (macvlan), and
myve1 binds to eth0, and eth0 has absolutely no IPv4 or IPv6 address.
The reason of using macvlan is to use multiple macvlan and macvtap
interfaces binding to eth0.

"ip l" shows as follows:
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: eth0: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether dc:a6:32:bb:99:d9 brd ff:ff:ff:ff:ff:ff
3: myve1@eth0: <BROADCAST,MULTICAST,ALLMULTI,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 96:8a:a9:8d:f6:64 brd ff:ff:ff:ff:ff:ff
4: myvtap1@eth0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 500
    link/ether 8e:7e:4b:95:3b:59 brd ff:ff:ff:ff:ff:ff
5: ip6tnl0@NONE: <NOARP> mtu 1452 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/tunnel6 :: brd :: permaddr 616:be05:411::
6: ip6tnl1@myve1: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1460 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/tunnel6 2400:4050:2ba1:ac00:99:f0ae:8600:2c00 peer 2001:380:a120::9 permaddr 9648:2668:3d4f::
7: wlan0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast state DOWN mode DEFAULT group default qlen 1000
    link/ether dc:a6:32:bb:99:da brd ff:ff:ff:ff:ff:ff

> I also see that you enable a number debugging options, including
> CONFIG_UBSAN_SANITIZE_ALL=y, which I would expect to make
> the kernel significantly slower when turned on. Is this one enabled
> in the other kernels as well, or did you find that it has a positive
> effect here?

As far as I see, CONFIG_UBSAN=y and CONFIG_UBSAN_SANITIZE_ALL=y
have not decreased the performance noticeablly (for my personal use cases).
So I choose to turn on them when I have chance to build a kernel.
As far as I can recall CONFIG_UBSAN related options did not
decrease the YouTube playing by firefox-esr.
For build of user-space applications, I have not seen " subjectively noticeable"
performance difference by UBSAN. So I routinely use -fanitize=undefined.
ASAN and MSAN are terribly slow, as we know well.

> As mentioned above, turning off the unused platforms /should/ not
> make a difference other than code size. Do you get different
> results if you drop all the CONFIG_ARCH_*=n lines from the
> fragment? If you do, I would consider that a problem in the
> upstream kernel that needs to be investigated further.

Having look at arch/arm64/Kconfig.platforms, I see some options
depending on CONFIG_ARCH_*. Besides the ones
mentioned at the beginning, they include
IRQ_DOMAIN_HIERARCHY
ARM_GIC

The *IRQ* and ARM_GIC config options can have some impact on the performance,
if a use case includes lots of HW interrupts, as I am using it

I am ready to re-build a Debian kernel with only CONFIG_ARCH_*
(except CONFIG_ARCH_BCM2835) disabled.

Best regards, Ryutaroh


Reply to: