Bug#972709: Wishlist/RFC: Change to CONFIG_PREEMPT_NONE in linux-image-cloud-*

To: Ben Hutchings <ben@decadent.org.uk>, 972709@bugs.debian.org
Subject: Bug#972709: Wishlist/RFC: Change to CONFIG_PREEMPT_NONE in linux-image-cloud-*
From: Flavio Veloso Soares <flaviovs@magnux.com>
Date: Sun, 22 Nov 2020 13:45:37 -0800
Message-id: <[🔎] e95d04dd-8513-cf52-c1d6-b608d053b3ac@magnux.com>
Reply-to: Flavio Veloso Soares <flaviovs@magnux.com>, 972709@bugs.debian.org
In-reply-to: <303ebe0e4f9a10001c183f6eb6c789aa930c82f7.camel@decadent.org.uk>
References: <b7aad91f-31c1-89cf-075a-56402f4b81ce@magnux.com> <303ebe0e4f9a10001c183f6eb6c789aa930c82f7.camel@decadent.org.uk> <b7aad91f-31c1-89cf-075a-56402f4b81ce@magnux.com>

[Resending: just noticed that the reply I sent on Oct 23 didn't include b.d.o]

I don't think the article is about the same thing we're talking here. CONFIG_PREEMPT* options control the compromise between latency and throughput of *system calls* and *scheduling of CPU cycles spent in kernel mode*, not network traffic. Granted, networking is affected by the setting too, but intuition tells me that a nonpreemptible system call -- meaning, one that finish all processing until it ends, or blocks on I/O -- could even *decrease* network latency, not increase.

Unfortunately, I couldn't find many comprehensive benchmarks of kernel CONFIG_PREEMPT* options. The one at https://www.codeblueprint.co.uk/2019/12/23/linux-preemption-latency-throughput.html seems to be very thorough, and shows that the difference of latency between CONFIG_PREEMPT_VOLUNTARY and CONFIG_PREEMPT_NONE is actually nonexistent, while no-preemption provides noticeable more throughput.

This unsurprising conclusion alone tells that CONFIG_PREEMPT_NONE is a better choice for servers.

However, there's more. No benchmark touches the subject of overhead context switches and burstable CPU cycles "credit" system used in many (most?) cloud environments, which happens to be the target of *-cloud kernels. With voluntary preemption, all those cycles used in overhead context switches are not only wasted, but they still count against instance CPU "credits", and that reduces overall computing power available to the instance. This is like double-paying for something you don't need.

On 2020-10-23 6:04 p.m., Ben Hutchings wrote:

On Thu, 2020-10-22 at 13:43 -0700, Flavio Veloso wrote:

Package: linux-image-cloud-amd64
Version: 4.19+105+deb10u7
Severity: wishlist

Since cloud images are mostly run for server workloads in headless 
environments accessed via network only, it would be better if 
"linux-image-cloud-*" kernels were compiled with CONFIG_PREEMPT_NONE=y 
("No Forced Preemption (Server)").

Currently those packages use CONFIG_PREEMPT_VOLUNTARY=y ("Voluntary 
Kernel Preemption (Desktop)")

CONFIG_PREEMPT_NONE description from kernel help:

[...]

I know what it says, but I think the notion that latency is less
important on servers is outdated.

It's well known that people give up quickly on web pages that are slow
to load:
<https://www.gigaspaces.com/blog/amazon-found-every-100ms-of-latency-cost-them-1-in-sales/>.
And a web page can depend on (indirectly) very many servers, which
means that e.g. high latency that only occurs 1% of the time on any
single server actually affects a large fraction of requests.

Ben.

-- 
FVS

Reply to:

Follow-Ups:
- Bug#972709: Wishlist/RFC: Change to CONFIG_PREEMPT_NONE in linux-image-cloud-*
  - From: Ben Hutchings <ben@decadent.org.uk>

Prev by Date: Bug#974939: machine does not boot
Next by Date: Bug#972709: Wishlist/RFC: Change to CONFIG_PREEMPT_NONE in linux-image-cloud-*
Previous by thread: Bug#934781: firmware-iwlwifi: iwl4965: Microcode SW error detected
Next by thread: Bug#972709: Wishlist/RFC: Change to CONFIG_PREEMPT_NONE in linux-image-cloud-*
Index(es):
- Date
- Thread