filesystem slowdown with backports kernel
Hi,
we have a NAS system acting as a place to store our server's backups
(via rsync with link-dest). On that NAS we switched from the stable
kernel (4.9) to the one provided by backports (4.18) because of an
unrelated problem. When we do that, we see a slowdown of our backup
process, from the backup via rsync itself to deleting old backup
directories. The slowdown seems to be connected to the number of
files/directories as backups of systems with less files seem less
affected than the ones with many files.
So we started benchmarking and the following seems to do the trick in
showing our problem by creating about 100k directories and files (10
dirs containing 10000 directories and files for easier deleting between
tries):
#!/bin/bash
time (
for i in {0..9};do
for j in {0000..9999};do
mkdir -p $i/$j
touch $i/$j/1
done
done
)
We get the following results (with a variance within a few seconds)
4.9 ext4:
real 2m13.303s
user 0m4.976s
sys 0m20.424s
4.9 xfs:
real 2m7.416s
user 0m5.076s
sys 0m20.960s
4.18 ext4:
real 4m3.276s
user 2m46.401s
sys 1m12.546s
4.18 xfs:
real 3m53.430s
user 2m46.841s
sys 1m12.716s
About a 50% slowdown in time elapsed and quite an increase in user and sys.
To rule out something like spectre/meltdown-mitigations we tried the
oldest kernel package that's a higher version number than in stable we
could find on http://snapshot.debian.org from July 2017.
4.11 ext4:
real 3m28.443s
user 2m29.551s
sys 1m0.924s
4.11 xfs
real 3m32.438s
user 2m31.349s
sys 1m3.333s
It's a little faster than 4.18 but the problem still persists.
The NAS is using a software RAID 6 via MD, and we tested with the same
script on a desktop system to rule out the RAID as a problem source and
see the same thing:
4.9 ext4 desktop:
real 2m22.525s
user 0m6.176s
sys 0m20.872s
4.18 ext4 desktop:
real 4m16.412s
user 3m2.282s
sys 1m19.308s
So to us at looks like something is seriously wrong somewhere but have
no clue where exactly to look for anymore. Is the test flawed, did we
miss something about an expected slowdown in the news, is it really a
bug and if so where can we look to locate it more precisely?
Thanks in advance,
Jens Holzkämper
Reply to: