[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#1032104: linux: ppc64el iouring corrupted read



Source: linux
Version: 5.10.0-21-powerpc64le
Severity: grave
Justification: causes non-serious data loss
X-Debbugs-Cc: daniel@mariadb.org

Dear Maintainer,

*** Reporter, please consider answering these questions, where appropriate ***

   * What led up to the situation?
   * What exactly did you do (or not do) that was effective (or
     ineffective)?
   * What was the outcome of this action?
   * What outcome did you expect instead?

*** End of the template - remove these template lines ***

>From https://jira.mariadb.org/browse/MDEV-30728

MariaDB's mtr tests on a number of specific tests depend on the correct
kernel operation.

As observed in these tests, there is a ~1/5 chance the
encryption.innodb_encryption test will read zeros on the later part of
the 16k pages that InnoDB uses by default.

This affects MariaDB-10.6+ packages where there is a liburing in the
distribution.

This has been observed in the CI of Debian
(https://ci.debian.net/packages/m/mariadb/testing/ppc64el/)
and upstreams https://buildbot.mariadb.org/#/builders/318.
The one ppc64le worker that has the Debian 5.10.0-21 kernel,
the same as the Debian CI, has the prefix ppc64le-db-bbw1-*.

Test faults occur on all MariaDB 10.6+ builds in containers on this kernel.
There a no faults on non-ppc64le or RHEL7/8 based ppc64le kernels.

To reproduce:

apt-get install mariadb-test
cd /usr/share/mysql/mysql-test
./mtr --mysqld=--innodb-flush-method=fsync --mysqld=--innodb-use-native-aio=1 --vardir=/var/lib/mysql  --force encryption.innodb_encryption,innodb,undo0 --repeat=12 

A test will frequenty fail.

2023-02-28  1:41:01 0 [ERROR] InnoDB: Database page corruption on disk or a failed read of file './ibdata1' page [page id: space=0, page number=282]. You may have to recover from a backup.

(the page number isn't predictable)

The complete mtr error log of mariadb server is $PWD/var/log/mysqld.1.err

I tested on tmpfs. This is a different fault from bug #1020831 as:
* there is no iouring error, just a bunch of zeros where data was
  expected.
* this is ppc64le only.

Note, more serious faults exist on overlayfs (MDEV-28751) and remote
filesystems so sticking to local xfs, ext4, btrfs is recommended.

-- System Information:
Debian Release: bullseye
  APT prefers jammy-updates
  APT policy: (500, 'jammy-updates'), (500, 'jammy-security'), (500, 'jammy'), (100, 'jammy-backports')
Architecture: ppc64el (ppc64le)

Kernel: Linux 5.10.0-21-powerpc64le (SMP w/128 CPU threads)
Locale: LANG=C.UTF-8, LC_CTYPE=C.UTF-8 (charmap=UTF-8), LANGUAGE not set
Shell: /bin/sh linked to /usr/bin/dash
Init: unable to detect


Reply to: