[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#1037198: locales: please parallelise locale-gen



Hi!

On Thu, Jun 15, 2023 at 09:26:43PM +0200, Aurelien Jarno wrote:
> On 2023-06-07 16:04, наб wrote:
> > Posting as a bug per comment from Andrej; originally posted 2022-05-06 as
> >   https://salsa.debian.org/glibc-team/glibc/-/merge_requests/7
> > 
> > Patch based on current Salsa HEAD attached, incl. analysis.
> 
> Thanks for the patch. I looks good, I have a comment though.
> > MemFree: in /proc/meminfo is available on all supported Debian kernels,
> > and, indeed, exactly what procps free(1) uses
> What is the reason to use MemFree instead of MemAvailable.
That's what procps free(1) used, and all Debian kernels 
(kFreeBSD, Hurd, Linux) supported it.

> The Linux
> kernel tends to maintain MemFree close to 0 by using the free RAM as
> cache. MemAvailable also includes reclaimable memory blocks like cache
> or inactive pages and therefore sounds better suited.
Since I first posted this, procps free(1) started using MemAvailable to
evaluate free/used, so sure. I don't feel strongly either way.

A Hurd image from 2021 I have (bullseye branding) and the 2023 release
(bookworm branding) don't have MemAvailable, neither does kFreeBSD 10
(from the 2017 installer ISO; appears to be the latest from
 https://wiki.debian.org/Debian_GNU/kFreeBSD).

I've updated the Salsa revision and am including an updated patch here,
which overrides MemFree with MemAvailable if available.

Best,
наб
From d64e6b551948726dbe5cc6800e93a2d7b25d3f89 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=D0=BD=D0=B0=D0=B1?= <nabijaczleweli@nabijaczleweli.xyz>
Date: Fri, 6 May 2022 01:22:10 +0200
Subject: [PATCH] Parallelise locale-gen if possible
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
X-Mutt-PGP: OS

Assuming a very generous 200M free/localedef (because I saw a max RSS
of 147M w/time(1)), this will attempt to keep all jobs saturated,
and usually succeed. There's little starvation, since the vast majority
of time is spent in gzip(1) ‒ 1:14 user vs 27:55 sys

At 2.2ish seconds per locale, even on a low-end system of today with
4 CPUs (and 800 free MB), we can generate up to 4 locales at once
for 6.6s' speed-up. Assuming no super-pathological cases, this globally
scales in roughly ceil(locales/ncpus)*2.2s chunks, which is a massive
win

The only user-visible change is that, with nproc>1, the output is
  en_GB.UTF-8...
  <cursor here>
instead of
  en_GB.UTF-8... <cursor here, will print "done\n" when it's done>

MemFree: in /proc/meminfo is available on all supported Debian kernels,
MemAvailable: only on Linux; procps free(1) uses MemAvailable to
estimate "used" space where available.
---
 debian/local/usr_sbin/locale-gen | 31 +++++++++++++++++++++++++++++--
 1 file changed, 29 insertions(+), 2 deletions(-)

diff --git a/debian/local/usr_sbin/locale-gen b/debian/local/usr_sbin/locale-gen
index 7fa3d772..30f70f5e 100755
--- a/debian/local/usr_sbin/locale-gen
+++ b/debian/local/usr_sbin/locale-gen
@@ -23,6 +23,19 @@ is_entry_ok() {
 	fi
 }
 
+nproc="$(nproc 2>/dev/null)" || nproc=1
+if [ "$nproc" -gt 1 ]; then
+	mem_free=0
+	while read -r k v _; do
+		[ "$k" = "MemFree:"      ] && mem_free="$v"          || :
+		[ "$k" = "MemAvailable:" ] && mem_free="$v" && break || :  # Prefer using MemAvailable on Linux; other Debian kernels only have MemFree
+	done < /proc/meminfo || :
+	mem_free=$(( mem_free / 1024 / 200 ))
+	[ "$mem_free" -lt 1 ] && mem_free=1 || :
+	[ "$mem_free" -lt "$nproc" ] && nproc="$mem_free" || :
+	jobs=0; pids=
+fi 2>/dev/null
+
 echo "Generating locales (this might take a while)..."
 while read -r locale charset; do
 	if [ -z "$locale" ] || [ "${locale#\#}" != "$locale" ]; then continue; fi
@@ -35,6 +48,7 @@ while read -r locale charset; do
 	locale_at="${locale#*@}"
 	[ "$locale_at" = "$locale" ] && locale_at= || locale_at="@$locale_at"
 	printf "  %s.%s%s..." "$locale_base" "$charset" "$locale_at"
+	[ "$nproc" -gt 1 ] && echo || :
 
 	if [ -e "$USER_LOCALES/$locale" ]; then
 		input="$USER_LOCALES/$locale"
@@ -46,7 +60,20 @@ while read -r locale charset; do
 			input="$USER_LOCALES/$input"
 		fi
 	fi
-	localedef -i "$input" -c -f "$charset" -A /usr/share/locale/locale.alias "$locale" || :
-	echo " done"
+	localedef -i "$input" -c -f "$charset" -A /usr/share/locale/locale.alias "$locale" &
+	if [ "$nproc" -gt 1 ]; then
+		pids="$pids$! "
+		jobs=$(( jobs + 1 ))
+
+		if [ "$jobs" -ge "$nproc" ]; then
+			wait "${pids%% *}" || :
+			jobs=$(( jobs - 1 ))
+			pids="${pids#* }"
+		fi
+	else
+		wait
+		echo " done"
+	fi
 done < "$LOCALEGEN"
+wait
 echo "Generation complete."
-- 
2.39.2

Attachment: signature.asc
Description: PGP signature


Reply to: