Bug#1032104: linux: ppc64el iouring corrupted read
Source: linux
Version: 5.10.0-21-powerpc64le
Severity: grave
Justification: causes non-serious data loss
X-Debbugs-Cc: daniel@mariadb.org
Dear Maintainer,
*** Reporter, please consider answering these questions, where appropriate ***
* What led up to the situation?
* What exactly did you do (or not do) that was effective (or
ineffective)?
* What was the outcome of this action?
* What outcome did you expect instead?
*** End of the template - remove these template lines ***
>From https://jira.mariadb.org/browse/MDEV-30728
MariaDB's mtr tests on a number of specific tests depend on the correct
kernel operation.
As observed in these tests, there is a ~1/5 chance the
encryption.innodb_encryption test will read zeros on the later part of
the 16k pages that InnoDB uses by default.
This affects MariaDB-10.6+ packages where there is a liburing in the
distribution.
This has been observed in the CI of Debian
(https://ci.debian.net/packages/m/mariadb/testing/ppc64el/)
and upstreams https://buildbot.mariadb.org/#/builders/318.
The one ppc64le worker that has the Debian 5.10.0-21 kernel,
the same as the Debian CI, has the prefix ppc64le-db-bbw1-*.
Test faults occur on all MariaDB 10.6+ builds in containers on this kernel.
There a no faults on non-ppc64le or RHEL7/8 based ppc64le kernels.
To reproduce:
apt-get install mariadb-test
cd /usr/share/mysql/mysql-test
./mtr --mysqld=--innodb-flush-method=fsync --mysqld=--innodb-use-native-aio=1 --vardir=/var/lib/mysql --force encryption.innodb_encryption,innodb,undo0 --repeat=12
A test will frequenty fail.
2023-02-28 1:41:01 0 [ERROR] InnoDB: Database page corruption on disk or a failed read of file './ibdata1' page [page id: space=0, page number=282]. You may have to recover from a backup.
(the page number isn't predictable)
The complete mtr error log of mariadb server is $PWD/var/log/mysqld.1.err
I tested on tmpfs. This is a different fault from bug #1020831 as:
* there is no iouring error, just a bunch of zeros where data was
expected.
* this is ppc64le only.
Note, more serious faults exist on overlayfs (MDEV-28751) and remote
filesystems so sticking to local xfs, ext4, btrfs is recommended.
-- System Information:
Debian Release: bullseye
APT prefers jammy-updates
APT policy: (500, 'jammy-updates'), (500, 'jammy-security'), (500, 'jammy'), (100, 'jammy-backports')
Architecture: ppc64el (ppc64le)
Kernel: Linux 5.10.0-21-powerpc64le (SMP w/128 CPU threads)
Locale: LANG=C.UTF-8, LC_CTYPE=C.UTF-8 (charmap=UTF-8), LANGUAGE not set
Shell: /bin/sh linked to /usr/bin/dash
Init: unable to detect
Reply to: