On 22/9/25 4:33 am, John Paul Adrian Glaubitz wrote:
Modern compilers are already extremely good at optimizing code such that they use specific CPU extensions such that you often don't need handwritten assembly for optimal performance.
They are, but unfortunately languages like C that aren't equipped to deal with data specifics such as endian, despite being more prevalent 5 decades ago. I suppose they aren't designed to be as low level. I find things like intrinsics don't help as they are not portable or part of the language. Plus, taking GCC, it doesn't support endian access where it matters, at the memory level. Instead it has a built in to swap data. It's too late by then. It's obvious GCC endian support is designed more for x86 and not PPC, so effectively useless on PPC. In some ways x86/64 has better endian support than PPC, able to byte swap and reverse load/store, with bswap and movebe. The storage attribute to mark order does work better but isn't designed for arbitrary scalars.
You've likely read an article about The Byte Order Fallacy. While I agree in principle I disagree in practice. No one in this day and age or any other I can imagine would split a scalar load/store into discrete parts by the byte to be cross portable. It also doesn't produce efficient code. Even on x64 the latest GCC doesn't even know or doesn't figure out what code is doing that reads data bytes and shifts it in place. For either endian. It doesn't see what is going on to reduce it to direct access or a movebe. It lacks endian AI.
Now PPC Linux can deal with different storage formats or PCI and USB wouldn't work. But it can only go so far. The AMDGPU developers aren't going to put Linux cpu macros all over the place to be cross portable. At this point is where languages fail as in this day and age we still don't have a transparent way of dealing with endians. Not with C anyway.
-- My regards, Damien Stewart.