[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Architecture baseline for Forky



I note that I've mostly noped out of this discussion because https://mstdn.jp/@landley/115504860540842713 and https://mastodon.sdf.org/@washbear/115646255465589454 but as long as I'm catching up on back email anyway...

On 11/12/25 12:27, Adrian Bunk wrote:
We are already providing a non-PIE version of the Python interpreter for
users who need it for performance reasons, and it is for example
possible that the benefits of providing packages without hardening (for
situations where hardening is not necessary) might bring larger benefits
than architecture-optimized versions.

Long ago when I was doing https://landley.net/aboriginal/about.html (work which eventually allowed Alpine to be based on busybox), I benched that statically linking busybox let the autoconf stage of package builds complete about 20% faster under qemu.

(My theory was lazy binding patched out the PLT indirection on the first call which dirtied the executable page and forced QEMU to discard the native code cache and retranslate it, often multiple times as multiple indirections were dynamically patched. I later found it hilarious that the dynamic linking people went on to do snap and flatpak and so on, using FAR more space for no obvious gain...)

Does that mean static linking is faster everywhere? Dunno, I haven't tried "everywhere". You can't "optimize" without saying what you're optimizing FOR, and the ground changes out from under you.

Loop unrolling was an optimization, then became a pessimization when cpu caches showed up, then an optimization again when L2 caches showed up, and the pendulum went back and forth multiple times before I stopped trying to even track it sometime around when branch prediction turned into a security hole and people started doing TLB invalidation mitigations for it. My takeaway lesson was outside of tight inner loops, do the simple thing and let the hardware and optimizers take care of themselves.

I do know I left the Red Hat world for the Debian world when the new Fedora CD wouldn't install on the Pentium Pro I had at the time (because they'd "moved on" to an architecture newer than the hardware I was still using).

I had to learn what x86-64-v1 vs v2 were when an android NDK update made all binaries it produced segfault on my netbook. I cared because I was maintaining their command line utilities, and it was nice to be able to actually test that environment. But I didn't discard my hardware to humor the change, I just ran my test binaries under qemu until that netbook died...

There was talk back then (what, 2018?) about teaching repositories to know about various architecture flags so it could pull optimized packages for your machine, but the discussion petered out because the gains were small and the overhead was huge.

> Would x32 optimized for v3 be the best option for many use cases?

It would prevent the x86-64-v2 laptop I'm typing this on from running those binaries, but I've already talked to the netbsd guys and to them running on systems people want to use their stuff on is a point of pride. Like it used to be on Linux, before everybody got old and tired and needed to lighten the load.

Decisions have costs. It's your call to cull your herd and chastise the outliers, but it usually means some subset will move on to things that are still fun.

It's an interesting move giving ultimatums to people who never got forced onto windows and never moved to GPLv3. Not "I am stepping down from this and going this way instead", but "xfree86 is now under this new license, you will all comply hey where are you going"...

*shrug* You do you.

Rob


Reply to: