[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Question on BIGGEST_ALIGNMENT in GCC on NetBSD/m68k



Hi,

On 11.6.2025 18.49, John Paul Adrian Glaubitz wrote:
On Wed, 2025-06-11 at 18:32 +0300, Eero Tamminen wrote:
It will decrease performance if increased alignment means that something
that fit earlier into i/d-cache, does not fit any more.

»Control whether GCC aligns int, long, long long, float, double, and long double
  variables on a 32-bit boundary (-malign-int) or a 16-bit boundary (-mno-align-int).
  Aligning variables on 32-bit boundaries produces code that runs somewhat faster on
  processors with 32-bit busses at the expense of more memory.«

The "more memory" part is the gotcha I was referring to.


To get some numbers on this...

if you could provide vmlinuz & System.map files for both (otherwise
identical) 2-byte & 4-byte alignment kernel builds, using kernel config
here:
https://github.com/hatari/hatari/blob/main/tools/linux/kernel.config

I could measure the perf difference for the whole kernel boot, and if
there are differences, profile what causes those differences.

But anyway, as I have said before, I am not going to change my mind on this
and I'm already working on it. If you prefer maintaining a Linux port with
2 bytes alignment, you are free to do so. But please don't expect me to waste
my time on it.

Unsubstantiated performance claims are no good. I was offering help in substantiating them.


If perf improves, that's validation for the performance argument. If performance impact is insignificant, that's proof against claims of 4-byte alignment decreasing performance.

(Linux kernel has general "no ABI changes, as long as ABI has users" policy, so verified arguments like above might help sway kernel maintainers to help with potential 4-byte alignment issues.)


Now, if perf actually decreases with 4-byte alignment setups, it's something to investigate, and hopefully / eventually to fix. Pinpointing causes for such things is something where I can specifically help.


Note: the above kernel config is a minimal, monolithic[1] one. It's a starting point, much faster to build & boot, and makes it easier to pinpoint issues.

After that is done, I could measure + profile also something closer to latest Debian kernel config, if additional data points are needed, or you're just interested about alignment impact for (boot time) perf in specific additional drivers.

(Full m68k Debian is too heavy to boot in reasonable time on machines that Hatari emulates, due to missing crypto acceleration, but IMHO also unnecessary for kernel ABI change discussions.)


	- Eero

[1] This is due to limitations in the current Hatari profiler, it was intended for profiling ROM code and (CPU+DSP) programs on OSes which do not support shared libs / modules.

(It's been used e.g. to optimize upstream ScummVM, so that subset of its game engines work OK even on 32Mhz 030: https://scummvm.org/downloads/#release)


Reply to: