Hi Christian,
I appreciate your help. Those were good suggestions.
Thanks for the correction.558s dmesg: read kernel buffer failed: Operation not permittedThis isn't from the test, this is our test runner that tries to capture dmesg before and after [3] each test, for debugging purposes. These get exported as artifacts, and made available in our CI. This fails with rootless podman because reading dmesg is a privileged operation by default.
On the host, could you try $ sudo sysctl kernel.dmesg_restrict=0 and then run the test again. This should enable dmesg capturing by regular users, and if it really is the OOM killer, it should be logged there.
Possibly another factor: the kernel overcommits memory by default. If more actual memory is used than physically available, the OOM killer will kill something, which would neatly fit to the "Killed" above. You can turn off overcommitment with: $ sudo sysctl vm.overcommit_memory=2 Perhaps that also changes something.
The log output after applying both changes:
[ RUN ]
pow2_1D/accuracy_test.vs_fftw/complex_forward_len_67108864_single_op_batch_1_istride_1_CI_ostride_1_CI_idist_67108864_odist_67108864_ioffset_0_0_ooffset_0_0
[ OK ]
pow2_1D/accuracy_test.vs_fftw/complex_forward_len_67108864_single_op_batch_1_istride_1_CI_ostride_1_CI_idist_67108864_odist_67108864_ioffset_0_0_ooffset_0_0
(953 ms)
[ RUN ]
pow2_1D/accuracy_test.vs_fftw/complex_forward_len_134217728_double_ip_batch_4_istride_1_CI_ostride_1_CI_idist_134217728_odist_134217728_ioffset_0_0_ooffset_0_0
command1 FAIL non-zero exit status 1
The dmesg output from test after applying both changes:
[50555.651205] __vm_enough_memory: pid:
57317, comm: rocfft-test, bytes: 8592035840 not enough memory
for the allocation
[50555.651226] __vm_enough_memory: pid: 57317, comm:
rocfft-test, bytes: 8592035840 not enough memory for the
allocation
[50555.651233] __vm_enough_memory: pid: 57317, comm:
rocfft-test, bytes: 8572432384 not enough memory for the
allocation
[50555.651237] __vm_enough_memory: pid: 57317, comm:
rocfft-test, bytes: 8592166912 not enough memory for the
allocation
[50555.651261] show_signal_msg: 11 callbacks suppressed
[50555.651263] rocfft-test[57317]: segfault at 3c0 ip
00007fab8c38937b sp 00007faa749fe558 error 6 in
libfftw3.so.3.6.10[18937b,7fab8c224000+1c5000] likely on CPU 9
(core 4, socket 0)
[50555.651276] Code: 2d 57 15 48 8e 06 00 c4 c1 65 5c d9 c5 e5
57 1d 3b 8e 06 00 c4 43 7d 05 d2 05 c4 e3 7d 05 db 05 c4 41 4d
5c ca c4 c1 4d 58 f2 <c4> 43 7d 19 0c 0a 01 c4 41 79 29 0a
c5 55 58 cb c5 d5 5c eb 4d 8b
Could you share the output of rocminfo with both 6.1 and 6.10? I don't think it needs to be run in the test container, at least I don't see why the result on bare metal should differ.
See attached for rocminfo logs from Debian Stable. Here's the diff:
--- nightwatch-rocminfo-6.1.txt 2024-09-27 15:30:45.713049254 -0600 +++ nightwatch-rocminfo-6.10.txt 2024-09-27 15:30:41.808929634 -0600 @@ -33,7 +33,7 @@ L1: 32768(0x8000) KB Chip ID: 0(0x0) Cacheline Size: 64(0x40) - Max Clock Freq. (MHz): 3200 + Max Clock Freq. (MHz): 4829 BDFID: 0 Internal Node ID: 0 Compute Unit: 16 @@ -45,21 +45,21 @@ Pool Info: Pool 1 Segment: GLOBAL; FLAGS: FINE GRAINED - Size: 63516508(0x3c92f5c) KB + Size: 63523720(0x3c94b88) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Alignment: 4KB Accessible by all: TRUE Pool 2 Segment: GLOBAL; FLAGS: KERNARG, FINE GRAINED - Size: 63516508(0x3c92f5c) KB + Size: 63523720(0x3c94b88) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Alignment: 4KB Accessible by all: TRUE Pool 3 Segment: GLOBAL; FLAGS: COARSE GRAINED - Size: 63516508(0x3c92f5c) KB + Size: 63523720(0x3c94b88) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Alignment: 4KB @@ -113,7 +113,7 @@ Pool Info: Pool 1 Segment: GLOBAL; FLAGS: COARSE GRAINED - Size: 2097152(0x200000) KB + Size: 31761860(0x1e4a5c4) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Alignment: 4KB
I also just noticed that [2] is segfaulting, so there's clearly another issue even with the older kernel. I hadn't noticed that before. It didn't do that when rocfft 6.1.2 was first uploaded [4].
[4]: https://ci.rocm.debian.net/data/autopkgtest/unstable/amd64+gfx1035/r/rocfft/18220/log.gz[1]: https://ci.rocm.debian.net/data/autopkgtest/unstable/amd64+gfx1035/r/rocfft/33925/log.gz [2]: https://ci.rocm.debian.net/data/autopkgtest/unstable/amd64+gfx1035/r/rocfft/34278/log.gz[3]: https://sources.debian.org/src/rocfft/6.1.2-1/debian/tests/upstream-binaries/#L70
ROCk module is loaded KFD does not support xnack mode query. ROCr must assume xnack is disabled. ===================== HSA System Attributes ===================== Runtime Version: 1.1 System Timestamp Freq.: 1000.000000MHz Sig. Max Wait Duration: 18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count) Machine Model: LARGE System Endianness: LITTLE ========== HSA Agents ========== ******* Agent 1 ******* Name: AMD Ryzen 7 7735HS with Radeon Graphics Uuid: CPU-XX Marketing Name: AMD Ryzen 7 7735HS with Radeon Graphics Vendor Name: CPU Feature: None specified Profile: FULL_PROFILE Float Round Mode: NEAR Max Queue Number: 0(0x0) Queue Min Size: 0(0x0) Queue Max Size: 0(0x0) Queue Type: MULTI Node: 0 Device Type: CPU Cache Info: L1: 32768(0x8000) KB Chip ID: 0(0x0) Cacheline Size: 64(0x40) Max Clock Freq. (MHz): 3200 BDFID: 0 Internal Node ID: 0 Compute Unit: 16 SIMDs per CU: 0 Shader Engines: 0 Shader Arrs. per Eng.: 0 WatchPts on Addr. Ranges:1 Features: None Pool Info: Pool 1 Segment: GLOBAL; FLAGS: FINE GRAINED Size: 63516508(0x3c92f5c) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Alignment: 4KB Accessible by all: TRUE Pool 2 Segment: GLOBAL; FLAGS: KERNARG, FINE GRAINED Size: 63516508(0x3c92f5c) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Alignment: 4KB Accessible by all: TRUE Pool 3 Segment: GLOBAL; FLAGS: COARSE GRAINED Size: 63516508(0x3c92f5c) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Alignment: 4KB Accessible by all: TRUE ISA Info: ******* Agent 2 ******* Name: gfx1035 Uuid: GPU-XX Marketing Name: AMD Radeon Graphics Vendor Name: AMD Feature: KERNEL_DISPATCH Profile: BASE_PROFILE Float Round Mode: NEAR Max Queue Number: 128(0x80) Queue Min Size: 64(0x40) Queue Max Size: 131072(0x20000) Queue Type: MULTI Node: 1 Device Type: GPU Cache Info: L1: 16(0x10) KB L2: 2048(0x800) KB Chip ID: 5761(0x1681) Cacheline Size: 64(0x40) Max Clock Freq. (MHz): 2200 BDFID: 13312 Internal Node ID: 1 Compute Unit: 12 SIMDs per CU: 2 Shader Engines: 2 Shader Arrs. per Eng.: 2 WatchPts on Addr. Ranges:4 Features: KERNEL_DISPATCH Fast F16 Operation: TRUE Wavefront Size: 32(0x20) Workgroup Max Size: 1024(0x400) Workgroup Max Size per Dimension: x 1024(0x400) y 1024(0x400) z 1024(0x400) Max Waves Per CU: 32(0x20) Max Work-item Per CU: 1024(0x400) Grid Max Size: 4294967295(0xffffffff) Grid Max Size per Dimension: x 4294967295(0xffffffff) y 4294967295(0xffffffff) z 4294967295(0xffffffff) Max fbarriers/Workgrp: 32 Pool Info: Pool 1 Segment: GLOBAL; FLAGS: COARSE GRAINED Size: 2097152(0x200000) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Alignment: 4KB Accessible by all: FALSE Pool 2 Segment: GROUP Size: 64(0x40) KB Allocatable: FALSE Alloc Granule: 0KB Alloc Alignment: 0KB Accessible by all: FALSE ISA Info: ISA 1 Name: amdgcn-amd-amdhsa--gfx1035 Machine Models: HSA_MACHINE_MODEL_LARGE Profiles: HSA_PROFILE_BASE Default Rounding Mode: NEAR Default Rounding Mode: NEAR Fast f16: TRUE Workgroup Max Size: 1024(0x400) Workgroup Max Size per Dimension: x 1024(0x400) y 1024(0x400) z 1024(0x400) Grid Max Size: 4294967295(0xffffffff) Grid Max Size per Dimension: x 4294967295(0xffffffff) y 4294967295(0xffffffff) z 4294967295(0xffffffff) FBarrier Max Size: 32 *** Done ***
ROCk module is loaded KFD does not support xnack mode query. ROCr must assume xnack is disabled. ===================== HSA System Attributes ===================== Runtime Version: 1.1 System Timestamp Freq.: 1000.000000MHz Sig. Max Wait Duration: 18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count) Machine Model: LARGE System Endianness: LITTLE ========== HSA Agents ========== ******* Agent 1 ******* Name: AMD Ryzen 7 7735HS with Radeon Graphics Uuid: CPU-XX Marketing Name: AMD Ryzen 7 7735HS with Radeon Graphics Vendor Name: CPU Feature: None specified Profile: FULL_PROFILE Float Round Mode: NEAR Max Queue Number: 0(0x0) Queue Min Size: 0(0x0) Queue Max Size: 0(0x0) Queue Type: MULTI Node: 0 Device Type: CPU Cache Info: L1: 32768(0x8000) KB Chip ID: 0(0x0) Cacheline Size: 64(0x40) Max Clock Freq. (MHz): 4829 BDFID: 0 Internal Node ID: 0 Compute Unit: 16 SIMDs per CU: 0 Shader Engines: 0 Shader Arrs. per Eng.: 0 WatchPts on Addr. Ranges:1 Features: None Pool Info: Pool 1 Segment: GLOBAL; FLAGS: FINE GRAINED Size: 63523720(0x3c94b88) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Alignment: 4KB Accessible by all: TRUE Pool 2 Segment: GLOBAL; FLAGS: KERNARG, FINE GRAINED Size: 63523720(0x3c94b88) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Alignment: 4KB Accessible by all: TRUE Pool 3 Segment: GLOBAL; FLAGS: COARSE GRAINED Size: 63523720(0x3c94b88) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Alignment: 4KB Accessible by all: TRUE ISA Info: ******* Agent 2 ******* Name: gfx1035 Uuid: GPU-XX Marketing Name: AMD Radeon Graphics Vendor Name: AMD Feature: KERNEL_DISPATCH Profile: BASE_PROFILE Float Round Mode: NEAR Max Queue Number: 128(0x80) Queue Min Size: 64(0x40) Queue Max Size: 131072(0x20000) Queue Type: MULTI Node: 1 Device Type: GPU Cache Info: L1: 16(0x10) KB L2: 2048(0x800) KB Chip ID: 5761(0x1681) Cacheline Size: 64(0x40) Max Clock Freq. (MHz): 2200 BDFID: 13312 Internal Node ID: 1 Compute Unit: 12 SIMDs per CU: 2 Shader Engines: 2 Shader Arrs. per Eng.: 2 WatchPts on Addr. Ranges:4 Features: KERNEL_DISPATCH Fast F16 Operation: TRUE Wavefront Size: 32(0x20) Workgroup Max Size: 1024(0x400) Workgroup Max Size per Dimension: x 1024(0x400) y 1024(0x400) z 1024(0x400) Max Waves Per CU: 32(0x20) Max Work-item Per CU: 1024(0x400) Grid Max Size: 4294967295(0xffffffff) Grid Max Size per Dimension: x 4294967295(0xffffffff) y 4294967295(0xffffffff) z 4294967295(0xffffffff) Max fbarriers/Workgrp: 32 Pool Info: Pool 1 Segment: GLOBAL; FLAGS: COARSE GRAINED Size: 31761860(0x1e4a5c4) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Alignment: 4KB Accessible by all: FALSE Pool 2 Segment: GROUP Size: 64(0x40) KB Allocatable: FALSE Alloc Granule: 0KB Alloc Alignment: 0KB Accessible by all: FALSE ISA Info: ISA 1 Name: amdgcn-amd-amdhsa--gfx1035 Machine Models: HSA_MACHINE_MODEL_LARGE Profiles: HSA_PROFILE_BASE Default Rounding Mode: NEAR Default Rounding Mode: NEAR Fast f16: TRUE Workgroup Max Size: 1024(0x400) Workgroup Max Size per Dimension: x 1024(0x400) y 1024(0x400) z 1024(0x400) Grid Max Size: 4294967295(0xffffffff) Grid Max Size per Dimension: x 4294967295(0xffffffff) y 4294967295(0xffffffff) z 4294967295(0xffffffff) FBarrier Max Size: 32 *** Done ***