[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#1057251: librocfft0-tests: nondeterministic failures in random_real_3d/random_params.vs_fftw



Package: librocfft0-tests
Version: 5.5.0-6
Severity: normal

Dear Maintainer,

The rocfft tests passed then failed on amd64+gfx1032 with an identical set of
dependencies. The failing log contained:

     55s Random seed: 190206186
    <...>
    14657s [ RUN      ] random_real_3d/random_params.vs_fftw/0
    14658s unknown file: Failure
    14658s C++ exception with description "rocFFT plan execution failure"
thrown in the test body.
    14658s
    14658s [  FAILED  ] random_real_3d/random_params.vs_fftw/0, where
GetParam() = (0, 3, 1, 1, 2) (877 ms)
    14658s [ RUN      ] random_real_3d/random_params.vs_fftw/1
    14659s unknown file: Failure
    14659s C++ exception with description "rocFFT plan execution failure"
thrown in the test body.
    14659s
    14659s [  FAILED  ] random_real_3d/random_params.vs_fftw/1, where
GetParam() = (0, 3, 1, 1, 3) (960 ms)

https://ci.rocm.debian.net/data/autopkgtest/unstable/amd64+gfx1032/r/rocfft/1341/log.gz

The earlier passing log contained:


     57s Random seed: 1459638283
    <...>
    14609s [ RUN      ] random_real_3d/random_params.vs_fftw/0
    14609s [       OK ] random_real_3d/random_params.vs_fftw/0 (42 ms)
    14609s [ RUN      ] random_real_3d/random_params.vs_fftw/1
    14609s [       OK ] random_real_3d/random_params.vs_fftw/1 (45 ms)
    14609s [ RUN      ] random_real_3d/random_params.vs_fftw/2

https://ci.rocm.debian.net/data/autopkgtest/unstable/amd64+gfx1032/r/rocfft/894/log.gz

I've discussed this with the upstream rocFFT developers and they plan to change
rocfft-test to only run deterministic tests by default. That will ensure that
when end-users are verifying their installation, that they only run the tests
that have already been run by the upstream developers. There will be an option
to enable the nondeterministic tests, which they will use during their
development.

In the meantime, I would suggest that `--seed N` be added to the arguments
passed to rocfft-test in the autopkgtests. Disabling the nondeterminism in the
test suite makes it easier to compare results when the autopkgtests are
triggered by dependency updates and to compare results between different GPU
architectures.

We should also investigate to determine the underlying cause of the failure
with `--seed 190206186`.

Regards,
Cory Bloor


-- System Information:
Debian Release: trixie/sid
  APT prefers unstable
  APT policy: (500, 'unstable')
Architecture: amd64 (x86_64)

Kernel: Linux 6.5.0-4-amd64 (SMP w/32 CPU threads; PREEMPT)
Locale: LANG=en_CA.UTF-8, LC_CTYPE=en_CA.UTF-8 (charmap=UTF-8), LANGUAGE=en_CA:en
Shell: /bin/sh linked to /usr/bin/dash
Init: systemd (via /run/systemd/system)
LSM: AppArmor: enabled

Versions of packages librocfft0-tests depends on:
ii  libamdhip64-5                   5.2.3-13
ii  libboost-program-options1.74.0  1.74.0+ds1-23
ii  libc6                           2.37-12
ii  libfftw3-double3                3.3.10-1
ii  libfftw3-single3                3.3.10-1
ii  libgcc-s1                       13.2.0-7
ii  librocfft0                      5.5.0-6
ii  librocrand1                     5.5.1-2
ii  libstdc++6                      13.2.0-7

librocfft0-tests recommends no packages.

librocfft0-tests suggests no packages.

-- no debconf information


Reply to: