[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: slow memory I/O on AMD EPYC 9334



Thanks a lot for the explanation, it's much appreciated.

Actually initially we experienced a performance decline when comparing numerical simulations we do with proprietary software with customized "plug-ins" / "user subroutines". We tested different hardware and only by chance discovered the difference when running on different OS versions. We tried some of the usual benchmarks but it was very hard to reproduce the difference we see with our numerical simulations. I did this "fio" test without proper understanding and was hoping the significant performance gap might give a clue what might be going wrong in our initial problem.

Now I understand that it's actually *not* very likely that this test directly points to a bottle-neck in the system --the same that would also affect our numerical simulations.

I'm still chewing on that ;-)

It's not easy to do this kind of investigations on a heavily used production system (can't simply swap the OS :-) ...will see.

Gero.


On 20/10/2025 7:25 pm, Ben Hutchings wrote:
On Fri, 2025-10-17 at 14:11 +0200, Gero wrote:
Dear experts,

thanks a lot for your work and commitment on the Debian system. I'm
using Debian for years and I am generally very pleased with it.  :-)

With my company we do numerical simulations and recently did some
benchmarking tests on new AMD EPYC 9334 processors that showed a
significant performance loss of a current Debian system compared to an
older Red Hat or Rocky Linux. We could narrow that down to the following
finding:

Running these commands:

cd /dev/shm
echo 3 > /proc/sys/vm/drop_caches
fio --name=random-write --ioengine=posixaio --rw=randwrite --bs=4k
--numjobs=1 --size=4g --iodepth=1 --runtime=60 --time_based --end_fsync=1
[...]
I would like to know if you have an explanation or an idea.
I think for this specific test, the explanation is "this is a stupid I/O
pattern that no-one optimises for".  Using AIO with a depth of 1 is
effectively doing synchronous I/O in a less efficient way.

Added to that, POSIX AIO was never that efficient on Linux, and the
upstream developers seem to have more-or-less given up on it in favour
of io_uring.

And I wonder
if you would be interested in investigating the issue any further. Or if
you have a suggestion who I might address preferably.
If you can also see a regression for io_uring and a more sensible I/O
depth then this would probably be interesting for the upstream
developers,

Ben.



Reply to: