Bug#1057251: librocfft0-tests: nondeterministic failures in random_real_3d/random_params.vs_fftw
Package: librocfft0-tests
Version: 5.5.0-6
Severity: normal
Dear Maintainer,
The rocfft tests passed then failed on amd64+gfx1032 with an identical set of
dependencies. The failing log contained:
55s Random seed: 190206186
<...>
14657s [ RUN ] random_real_3d/random_params.vs_fftw/0
14658s unknown file: Failure
14658s C++ exception with description "rocFFT plan execution failure"
thrown in the test body.
14658s
14658s [ FAILED ] random_real_3d/random_params.vs_fftw/0, where
GetParam() = (0, 3, 1, 1, 2) (877 ms)
14658s [ RUN ] random_real_3d/random_params.vs_fftw/1
14659s unknown file: Failure
14659s C++ exception with description "rocFFT plan execution failure"
thrown in the test body.
14659s
14659s [ FAILED ] random_real_3d/random_params.vs_fftw/1, where
GetParam() = (0, 3, 1, 1, 3) (960 ms)
https://ci.rocm.debian.net/data/autopkgtest/unstable/amd64+gfx1032/r/rocfft/1341/log.gz
The earlier passing log contained:
57s Random seed: 1459638283
<...>
14609s [ RUN ] random_real_3d/random_params.vs_fftw/0
14609s [ OK ] random_real_3d/random_params.vs_fftw/0 (42 ms)
14609s [ RUN ] random_real_3d/random_params.vs_fftw/1
14609s [ OK ] random_real_3d/random_params.vs_fftw/1 (45 ms)
14609s [ RUN ] random_real_3d/random_params.vs_fftw/2
https://ci.rocm.debian.net/data/autopkgtest/unstable/amd64+gfx1032/r/rocfft/894/log.gz
I've discussed this with the upstream rocFFT developers and they plan to change
rocfft-test to only run deterministic tests by default. That will ensure that
when end-users are verifying their installation, that they only run the tests
that have already been run by the upstream developers. There will be an option
to enable the nondeterministic tests, which they will use during their
development.
In the meantime, I would suggest that `--seed N` be added to the arguments
passed to rocfft-test in the autopkgtests. Disabling the nondeterminism in the
test suite makes it easier to compare results when the autopkgtests are
triggered by dependency updates and to compare results between different GPU
architectures.
We should also investigate to determine the underlying cause of the failure
with `--seed 190206186`.
Regards,
Cory Bloor
-- System Information:
Debian Release: trixie/sid
APT prefers unstable
APT policy: (500, 'unstable')
Architecture: amd64 (x86_64)
Kernel: Linux 6.5.0-4-amd64 (SMP w/32 CPU threads; PREEMPT)
Locale: LANG=en_CA.UTF-8, LC_CTYPE=en_CA.UTF-8 (charmap=UTF-8), LANGUAGE=en_CA:en
Shell: /bin/sh linked to /usr/bin/dash
Init: systemd (via /run/systemd/system)
LSM: AppArmor: enabled
Versions of packages librocfft0-tests depends on:
ii libamdhip64-5 5.2.3-13
ii libboost-program-options1.74.0 1.74.0+ds1-23
ii libc6 2.37-12
ii libfftw3-double3 3.3.10-1
ii libfftw3-single3 3.3.10-1
ii libgcc-s1 13.2.0-7
ii librocfft0 5.5.0-6
ii librocrand1 5.5.1-2
ii libstdc++6 13.2.0-7
librocfft0-tests recommends no packages.
librocfft0-tests suggests no packages.
-- no debconf information
Reply to: