Hi Christian,
Am I interpreting this right that the "Killed" disappeared? If so, then the issue should be reproducible by re-enabling vm.overcommit_memory=0.
"Killed" disappeared when I ran it myself in both cases. However, it did get further with vm.overcommit_memory=0:
[ RUN ]
pow2_1D/accuracy_test.vs_fftw/complex_forward_len_67108864_single_op_batch_1_istride_1_CI_ostride_1_CI_idist_67108864_odist_67108864_ioffset_0_0_ooffset_0_0
[ OK ]
pow2_1D/accuracy_test.vs_fftw/complex_forward_len_67108864_single_op_batch_1_istride_1_CI_ostride_1_CI_idist_67108864_odist_67108864_ioffset_0_0_ooffset_0_0
(881 ms)
[ RUN ]
pow2_1D/accuracy_test.vs_fftw/complex_forward_len_134217728_double_ip_batch_4_istride_1_CI_ostride_1_CI_idist_134217728_odist_134217728_ioffset_0_0_ooffset_0_0
[ OK ]
pow2_1D/accuracy_test.vs_fftw/complex_forward_len_134217728_double_ip_batch_4_istride_1_CI_ostride_1_CI_idist_134217728_odist_134217728_ioffset_0_0_ooffset_0_0
(76872 ms)
[ RUN ]
pow2_1D/accuracy_test.vs_fftw/complex_forward_len_134217728_double_op_batch_4_istride_1_CI_ostride_1_CI_idist_134217728_odist_134217728_ioffset_0_0_ooffset_0_0
[ OK ]
pow2_1D/accuracy_test.vs_fftw/complex_forward_len_134217728_double_op_batch_4_istride_1_CI_ostride_1_CI_idist_134217728_odist_134217728_ioffset_0_0_ooffset_0_0
(11141 ms)
[ RUN ]
pow2_1D/accuracy_test.vs_fftw/complex_forward_len_134217728_double_ip_batch_2_istride_1_CI_ostride_1_CI_idist_134217728_odist_134217728_ioffset_0_0_ooffset_0_0
[ OK ]
pow2_1D/accuracy_test.vs_fftw/complex_forward_len_134217728_double_ip_batch_2_istride_1_CI_ostride_1_CI_idist_134217728_odist_134217728_ioffset_0_0_ooffset_0_0
(5230 ms)
[ RUN ]
pow2_1D/accuracy_test.vs_fftw/complex_forward_len_134217728_double_op_batch_2_istride_1_CI_ostride_1_CI_idist_134217728_odist_134217728_ioffset_0_0_ooffset_0_0
[ OK ]
pow2_1D/accuracy_test.vs_fftw/complex_forward_len_134217728_double_op_batch_2_istride_1_CI_ostride_1_CI_idist_134217728_odist_134217728_ioffset_0_0_ooffset_0_0
(5429 ms)
[ RUN ]
pow2_1D/accuracy_test.vs_fftw/complex_forward_len_134217728_double_op_batch_2_istride_1_CI_ostride_1_CP_idist_134217728_odist_134217728_ioffset_0_0_ooffset_0_0
[ OK ]
pow2_1D/accuracy_test.vs_fftw/complex_forward_len_134217728_double_op_batch_2_istride_1_CI_ostride_1_CP_idist_134217728_odist_134217728_ioffset_0_0_ooffset_0_0
(6498 ms)
[ RUN ]
pow2_1D/accuracy_test.vs_fftw/complex_forward_len_134217728_double_ip_batch_1_istride_1_CI_ostride_1_CI_idist_134217728_odist_134217728_ioffset_0_0_ooffset_0_0
[ OK ]
pow2_1D/accuracy_test.vs_fftw/complex_forward_len_134217728_double_ip_batch_1_istride_1_CI_ostride_1_CI_idist_134217728_odist_134217728_ioffset_0_0_ooffset_0_0
(2630 ms)
[ RUN ]
pow2_1D/accuracy_test.vs_fftw/complex_forward_len_134217728_double_op_batch_1_istride_1_CI_ostride_1_CI_idist_134217728_odist_134217728_ioffset_0_0_ooffset_0_0
[ OK ]
pow2_1D/accuracy_test.vs_fftw/complex_forward_len_134217728_double_op_batch_1_istride_1_CI_ostride_1_CI_idist_134217728_odist_134217728_ioffset_0_0_ooffset_0_0
(2718 ms)
[ RUN ]
pow2_1D/accuracy_test.vs_fftw/complex_forward_len_134217728_single_ip_batch_4_istride_1_CI_ostride_1_CI_idist_134217728_odist_134217728_ioffset_0_0_ooffset_0_0
[ OK ]
pow2_1D/accuracy_test.vs_fftw/complex_forward_len_134217728_single_ip_batch_4_istride_1_CI_ostride_1_CI_idist_134217728_odist_134217728_ioffset_0_0_ooffset_0_0
(8447 ms)
[ RUN ]
pow2_1D/accuracy_test.vs_fftw/complex_forward_len_134217728_single_op_batch_4_istride_1_CI_ostride_1_CI_idist_134217728_odist_134217728_ioffset_0_0_ooffset_0_0
[ OK ]
pow2_1D/accuracy_test.vs_fftw/complex_forward_len_134217728_single_op_batch_4_istride_1_CI_ostride_1_CI_idist_134217728_odist_134217728_ioffset_0_0_ooffset_0_0
(7018 ms)
[ RUN ]
pow2_1D/accuracy_test.vs_fftw/complex_forward_len_134217728_single_ip_batch_2_istride_1_CI_ostride_1_CI_idist_134217728_odist_134217728_ioffset_0_0_ooffset_0_0
[ OK ]
pow2_1D/accuracy_test.vs_fftw/complex_forward_len_134217728_single_ip_batch_2_istride_1_CI_ostride_1_CI_idist_134217728_odist_134217728_ioffset_0_0_ooffset_0_0
(3510 ms)
[ RUN ]
pow2_1D/accuracy_test.vs_fftw/complex_forward_len_134217728_single_ip_batch_2_istride_1_CP_ostride_1_CP_idist_134217728_odist_134217728_ioffset_0_0_ooffset_0_0
[ OK ]
pow2_1D/accuracy_test.vs_fftw/complex_forward_len_134217728_single_ip_batch_2_istride_1_CP_ostride_1_CP_idist_134217728_odist_134217728_ioffset_0_0_ooffset_0_0
(4090 ms)
[ RUN ]
pow2_1D/accuracy_test.vs_fftw/complex_forward_len_134217728_single_op_batch_2_istride_1_CI_ostride_1_CI_idist_134217728_odist_134217728_ioffset_0_0_ooffset_0_0
[ OK ]
pow2_1D/accuracy_test.vs_fftw/complex_forward_len_134217728_single_op_batch_2_istride_1_CI_ostride_1_CI_idist_134217728_odist_134217728_ioffset_0_0_ooffset_0_0
(3520 ms)
[ RUN ]
pow2_1D/accuracy_test.vs_fftw/complex_forward_len_134217728_single_ip_batch_1_istride_1_CI_ostride_1_CI_idist_134217728_odist_134217728_ioffset_0_0_ooffset_0_0
[ OK ]
pow2_1D/accuracy_test.vs_fftw/complex_forward_len_134217728_single_ip_batch_1_istride_1_CI_ostride_1_CI_idist_134217728_odist_134217728_ioffset_0_0_ooffset_0_0
(1766 ms)
[ RUN ]
pow2_1D/accuracy_test.vs_fftw/complex_forward_len_134217728_single_op_batch_1_istride_1_CI_ostride_1_CI_idist_134217728_odist_134217728_ioffset_0_0_ooffset_0_0
[ OK ]
pow2_1D/accuracy_test.vs_fftw/complex_forward_len_134217728_single_op_batch_1_istride_1_CI_ostride_1_CI_idist_134217728_odist_134217728_ioffset_0_0_ooffset_0_0
(1771 ms)
[ RUN ]
pow2_1D/accuracy_test.vs_fftw/complex_forward_len_268435456_double_ip_batch_4_istride_1_CI_ostride_1_CI_idist_268435456_odist_268435456_ioffset_0_0_ooffset_0_0
clients/tests/accuracy_test.h:1214: Skipped
needed_ramgb: 96, ramgb limit: 61.
[ SKIPPED ]
pow2_1D/accuracy_test.vs_fftw/complex_forward_len_268435456_double_ip_batch_4_istride_1_CI_ostride_1_CI_idist_268435456_odist_268435456_ioffset_0_0_ooffset_0_0
(0 ms)
[ RUN ]
pow2_1D/accuracy_test.vs_fftw/complex_forward_len_268435456_double_op_batch_4_istride_1_CI_ostride_1_CI_idist_268435456_odist_268435456_ioffset_0_0_ooffset_0_0
clients/tests/accuracy_test.h:1214: Skipped
needed_ramgb: 96, ramgb limit: 61.
[ SKIPPED ]
pow2_1D/accuracy_test.vs_fftw/complex_forward_len_268435456_double_op_batch_4_istride_1_CI_ostride_1_CI_idist_268435456_odist_268435456_ioffset_0_0_ooffset_0_0
(0 ms)
[ RUN ]
pow2_1D/accuracy_test.vs_fftw/complex_forward_len_268435456_double_op_batch_4_istride_1_CP_ostride_1_CI_idist_268435456_odist_268435456_ioffset_0_0_ooffset_0_0
clients/tests/accuracy_test.h:1214: Skipped
needed_ramgb: 96, ramgb limit: 61.
[ SKIPPED ]
pow2_1D/accuracy_test.vs_fftw/complex_forward_len_268435456_double_op_batch_4_istride_1_CP_ostride_1_CI_idist_268435456_odist_268435456_ioffset_0_0_ooffset_0_0
(0 ms)
[ RUN ]
pow2_1D/accuracy_test.vs_fftw/complex_forward_len_268435456_double_ip_batch_2_istride_1_CI_ostride_1_CI_idist_268435456_odist_268435456_ioffset_0_0_ooffset_0_0
[ OK ]
pow2_1D/accuracy_test.vs_fftw/complex_forward_len_268435456_double_ip_batch_2_istride_1_CI_ostride_1_CI_idist_268435456_odist_268435456_ioffset_0_0_ooffset_0_0
(67340 ms)
[ RUN ]
pow2_1D/accuracy_test.vs_fftw/complex_forward_len_268435456_double_op_batch_2_istride_1_CI_ostride_1_CI_idist_268435456_odist_268435456_ioffset_0_0_ooffset_0_0
[ OK ]
pow2_1D/accuracy_test.vs_fftw/complex_forward_len_268435456_double_op_batch_2_istride_1_CI_ostride_1_CI_idist_268435456_odist_268435456_ioffset_0_0_ooffset_0_0
(11059 ms)
[ RUN ]
pow2_1D/accuracy_test.vs_fftw/complex_forward_len_268435456_double_op_batch_2_istride_1_CI_ostride_1_CP_idist_268435456_odist_268435456_ioffset_0_0_ooffset_0_0
[ OK ]
pow2_1D/accuracy_test.vs_fftw/complex_forward_len_268435456_double_op_batch_2_istride_1_CI_ostride_1_CP_idist_268435456_odist_268435456_ioffset_0_0_ooffset_0_0
(12243 ms)
[ RUN ]
pow2_1D/accuracy_test.vs_fftw/complex_forward_len_268435456_double_ip_batch_1_istride_1_CI_ostride_1_CI_idist_268435456_odist_268435456_ioffset_0_0_ooffset_0_0
[ OK ]
pow2_1D/accuracy_test.vs_fftw/complex_forward_len_268435456_double_ip_batch_1_istride_1_CI_ostride_1_CI_idist_268435456_odist_268435456_ioffset_0_0_ooffset_0_0
(5412 ms)
[ RUN ]
pow2_1D/accuracy_test.vs_fftw/complex_forward_len_268435456_double_op_batch_1_istride_1_CI_ostride_1_CI_idist_268435456_odist_268435456_ioffset_0_0_ooffset_0_0
[ OK ]
pow2_1D/accuracy_test.vs_fftw/complex_forward_len_268435456_double_op_batch_1_istride_1_CI_ostride_1_CI_idist_268435456_odist_268435456_ioffset_0_0_ooffset_0_0
(5695 ms)
[ RUN ]
pow2_1D/accuracy_test.vs_fftw/complex_forward_len_268435456_single_ip_batch_4_istride_1_CI_ostride_1_CI_idist_268435456_odist_268435456_ioffset_0_0_ooffset_0_0
command1 FAIL non-zero exit status 1
The dmesg logs indicate the oom killer activating with vm.overcommit_memory=0:Pool Info: Pool 1 Segment: GLOBAL; FLAGS: COARSE GRAINED - Size: 2097152(0x200000) KB + Size: 31761860(0x1e4a5c4) KBThis is the pool from the gfx1035. It increased in size from 2GiB to ~32GiB. If overcommit was indeed the issue behind "Killed", then I suspect that the test malloc'ed so much such that it eventually triggered the OOM when both test and GPU consumed all physical memory, eg: with a 32GiB large test case computed on both GPU and CPU for expected/actual comparison.
[ 633.775419] rocfft-test invoked
oom-killer:
gfp_mask=0x440dc0(GFP_KERNEL_ACCOUNT|__GFP_COMP|__GFP_ZERO),
order=0, oom_score_adj=0
I've attached the rest of the dmesg log for the test. It has more
details.
Sincerely,
Cory Bloor
[ 633.775419] rocfft-test invoked oom-killer: gfp_mask=0x440dc0(GFP_KERNEL_ACCOUNT|__GFP_COMP|__GFP_ZERO), order=0, oom_score_adj=0 [ 633.775426] CPU: 7 PID: 4053 Comm: rocfft-test Not tainted 6.10.6+bpo-amd64 #1 Debian 6.10.6-1~bpo12+1 [ 633.775429] Hardware name: Micro Computer (HK) Tech Limited UM773 Lite/F7BFD, BIOS 1.06 02/27/2023 [ 633.775430] Call Trace: [ 633.775432] <TASK> [ 633.775434] dump_stack_lvl+0x64/0x80 [ 633.775440] dump_header+0x44/0x1b0 [ 633.775444] oom_kill_process+0xfa/0x200 [ 633.775447] out_of_memory+0x257/0x520 [ 633.775450] __alloc_pages_slowpath.constprop.0+0xaaa/0xd60 [ 633.775456] __alloc_pages_noprof+0x309/0x340 [ 633.775460] alloc_pages_mpol_noprof+0xd9/0x1e0 [ 633.775464] pte_alloc_one+0x1d/0x60 [ 633.775468] __pte_alloc+0x2a/0xb0 [ 633.775472] do_anonymous_page+0x52b/0x7b0 [ 633.775474] ? lruvec_stat_mod_folio.constprop.0+0x1c/0x30 [ 633.775476] ? srso_alias_return_thunk+0x5/0xfbef5 [ 633.775479] ? __pmd_alloc+0x148/0x200 [ 633.775481] __handle_mm_fault+0xc3e/0x1070 [ 633.775484] ? srso_alias_return_thunk+0x5/0xfbef5 [ 633.775488] handle_mm_fault+0x190/0x320 [ 633.775491] hmm_vma_fault.isra.0+0x4d/0x90 [ 633.775495] walk_pgd_range+0x34d/0xa90 [ 633.775500] ? srso_alias_return_thunk+0x5/0xfbef5 [ 633.775502] __walk_page_range+0x198/0x1b0 [ 633.775505] walk_page_range+0x13d/0x200 [ 633.775508] hmm_range_fault+0x5f/0xa0 [ 633.775513] amdgpu_hmm_range_get_pages+0x144/0x260 [amdgpu] [ 633.775717] amdgpu_ttm_tt_get_user_pages+0xc1/0x1a0 [amdgpu] [ 633.775838] amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu+0x50e/0xb40 [amdgpu] [ 633.775991] kfd_ioctl_alloc_memory_of_gpu+0xd5/0x270 [amdgpu] [ 633.776139] kfd_ioctl+0x3af/0x4c0 [amdgpu] [ 633.776280] ? __pfx_kfd_ioctl_alloc_memory_of_gpu+0x10/0x10 [amdgpu] [ 633.776422] __x64_sys_ioctl+0x97/0xd0 [ 633.776426] do_syscall_64+0x82/0x190 [ 633.776432] ? srso_alias_return_thunk+0x5/0xfbef5 [ 633.776434] ? vm_mmap_pgoff+0x131/0x1c0 [ 633.776437] ? srso_alias_return_thunk+0x5/0xfbef5 [ 633.776439] ? syscall_exit_to_user_mode+0x77/0x210 [ 633.776441] ? srso_alias_return_thunk+0x5/0xfbef5 [ 633.776443] ? do_syscall_64+0x8e/0x190 [ 633.776445] ? srso_alias_return_thunk+0x5/0xfbef5 [ 633.776446] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 633.776450] RIP: 0033:0x7fe8bf3164bb [ 633.776454] Code: Unable to access opcode bytes at 0x7fe8bf316491. [ 633.776455] RSP: 002b:00007ffd9f7cddb0 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ 633.776457] RAX: ffffffffffffffda RBX: 00000000c0000004 RCX: 00007fe8bf3164bb [ 633.776459] RDX: 00007ffd9f7cde50 RSI: 00000000c0284b16 RDI: 0000000000000003 [ 633.776460] RBP: 00007ffd9f7cde50 R08: 00007ffd9f7cdf48 R09: 00000000c0000004 [ 633.776461] R10: 0000000000004022 R11: 0000000000000246 R12: 00000000c0284b16 [ 633.776462] R13: 0000000000000003 R14: 00007ffd9f7cdf48 R15: 00007fe8c4a71278 [ 633.776466] </TASK> [ 633.776467] Mem-Info: [ 633.776470] active_anon:10824089 inactive_anon:666438 isolated_anon:0 active_file:82 inactive_file:164 isolated_file:0 unevictable:0 dirty:0 writeback:0 slab_reclaimable:7541 slab_unreclaimable:23829 mapped:44 shmem:294 pagetables:24401 sec_pagetables:1054 bounce:0 kernel_misc_reclaimable:0 free:37599 free_pcp:19 free_cma:0 [ 633.776473] Node 0 active_anon:43296924kB inactive_anon:2665184kB active_file:328kB inactive_file:656kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:176kB dirty:0kB writeback:0kB shmem:1176kB shmem_thp:0kB shmem_pmdmapped:0kB anon_thp:44812288kB writeback_tmp:0kB kernel_stack:5824kB pagetables:97604kB sec_pagetables:4216kB all_unreclaimable? no [ 633.776477] Node 0 DMA free:824kB boost:0kB min:16kB low:28kB high:40kB reserved_highatomic:0KB active_anon:80kB inactive_anon:10336kB active_file:0kB inactive_file:24kB unevictable:0kB writepending:0kB present:15996kB managed:15360kB mlocked:0kB bounce:0kB free_pcp:8kB local_pcp:0kB free_cma:0kB [ 633.776481] lowmem_reserve[]: 0 2725 61975 0 0 [ 633.776486] Node 0 DMA32 free:2928kB boost:0kB min:2968kB low:5756kB high:8544kB reserved_highatomic:0KB active_anon:264kB inactive_anon:237144kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:2895008kB managed:2829172kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB [ 633.776490] lowmem_reserve[]: 0 0 59249 0 0 [ 633.776494] Node 0 Normal free:146644kB boost:194924kB min:259516kB low:320184kB high:380852kB reserved_highatomic:0KB active_anon:43297812kB inactive_anon:2416472kB active_file:524kB inactive_file:16kB unevictable:0kB writepending:0kB present:61836288kB managed:60679192kB mlocked:0kB bounce:0kB free_pcp:68kB local_pcp:0kB free_cma:0kB [ 633.776498] lowmem_reserve[]: 0 0 0 0 0 [ 633.776501] Node 0 DMA: 2*4kB (U) 5*8kB (U) 16*16kB (U) 16*32kB (U) 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 816kB [ 633.776513] Node 0 DMA32: 0*4kB 0*8kB 1*16kB (U) 2*32kB (UM) 1*64kB (M) 1*128kB (U) 1*256kB (U) 1*512kB (U) 2*1024kB (UM) 0*2048kB 0*4096kB = 3088kB [ 633.776525] Node 0 Normal: 1608*4kB (UE) 2125*8kB (UE) 2767*16kB (UE) 2436*32kB (UE) 6*64kB (U) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 146040kB [ 633.776537] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [ 633.776538] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [ 633.776540] 883 total pagecache pages [ 633.776541] 341 pages in swap cache [ 633.776542] Free swap = 0kB [ 633.776542] Total swap = 999420kB [ 633.776543] 16186823 pages RAM [ 633.776544] 0 pages HighMem/MovableOnly [ 633.776545] 305892 pages reserved [ 633.776545] 0 pages hwpoisoned [ 633.776546] Tasks state (memory values in pages): [ 633.776547] [ pid ] uid tgid total_vm rss rss_anon rss_file rss_shmem pgtables_bytes swapents oom_score_adj name [ 633.776555] [ 434] 0 434 24307 224 224 0 0 221184 0 -250 systemd-journal [ 633.776559] [ 455] 0 455 6760 443 384 59 0 77824 0 -1000 systemd-udevd [ 633.776562] [ 647] 997 647 22520 286 224 62 0 81920 0 0 systemd-timesyn [ 633.776564] [ 666] 0 666 1468 226 160 66 0 53248 0 0 dhclient [ 633.776566] [ 685] 0 685 1652 51 0 51 0 57344 0 0 cron [ 633.776568] [ 686] 100 686 2309 64 64 0 0 57344 0 -900 dbus-daemon [ 633.776571] [ 688] 0 688 6202 2453 2432 21 0 81920 0 0 gpuenv-server [ 633.776573] [ 690] 0 690 38186 61 32 29 0 69632 0 -1000 lxcfs [ 633.776575] [ 692] 0 692 55447 286 224 62 0 94208 0 0 rsyslogd [ 633.776577] [ 693] 0 693 12451 250 224 26 0 106496 0 0 systemd-logind [ 633.776579] [ 698] 104 698 644 38 0 38 0 40960 0 0 debci-publisher [ 633.776581] [ 715] 0 715 1458 79 32 47 0 49152 0 0 lxc-monitord [ 633.776584] [ 721] 0 721 1468 82 32 50 0 53248 0 0 agetty [ 633.776586] [ 724] 0 724 541388 3540 3540 0 0 442368 288 -999 containerd [ 633.776588] [ 771] 0 771 3858 379 288 91 0 69632 0 -1000 sshd [ 633.776590] [ 777] 104 777 4786 320 320 0 0 77824 0 100 systemd [ 633.776592] [ 815] 104 815 669 37 0 37 0 45056 0 0 inotifywait [ 633.776595] [ 820] 104 820 42286 861 791 70 0 98304 0 100 (sd-pam) [ 633.776597] [ 897] 0 897 568107 2028 2028 0 0 557056 5728 0 dockerd [ 633.776600] [ 962] 103 962 3544 138 104 34 0 65536 0 0 dnsmasq [ 633.776602] [ 1188] 0 1188 4505 396 384 12 0 69632 0 0 sshd [ 633.776604] [ 1191] 1000 1191 4764 487 448 39 0 77824 0 100 systemd [ 633.776606] [ 1192] 1000 1192 42286 916 829 87 0 98304 0 100 (sd-pam) [ 633.776609] [ 1211] 1000 1211 4570 383 328 55 0 73728 128 0 sshd [ 633.776611] [ 1212] 1000 1212 2058 251 192 59 0 57344 192 0 bash [ 633.776613] [ 1299] 1000 1299 2577 52 32 20 0 61440 64 0 sudo [ 633.776616] [ 1300] 1000 1300 2577 94 37 57 0 57344 64 0 sudo [ 633.776618] [ 1301] 0 1301 2325 46 0 46 0 61440 96 0 su [ 633.776620] [ 1302] 104 1302 644 109 32 77 0 49152 0 0 sh [ 633.776622] [ 1305] 104 1305 9771 501 432 69 0 118784 4064 0 autopkgtest [ 633.776625] [ 1306] 104 1306 1367 61 0 61 0 49152 0 0 tee [ 633.776627] [ 1307] 104 1307 1367 71 0 71 0 53248 0 0 tee [ 633.776629] [ 1308] 104 1308 1637 64 32 32 0 57344 0 0 mawk [ 633.776631] [ 1309] 104 1309 5649 436 311 125 0 81920 1824 0 autopkgtest-vir [ 633.776633] [ 1339] 104 1339 268 59 0 59 0 40960 0 0 catatonit [ 633.776635] [ 1361] 104 1361 2247 113 96 17 0 53248 0 200 dbus-daemon [ 633.776637] [ 1364] 104 1364 2140 1366 1315 51 0 57344 0 0 fuse-overlayfs [ 633.776640] [ 1367] 104 1367 2212 72 67 5 0 57344 0 0 conmon [ 633.776642] [ 1370] 104 1370 645 42 0 42 0 49152 0 0 sleep [ 633.776644] [ 1372] 104 1372 1480 106 56 50 0 53248 0 0 slirp4netns [ 633.776646] [ 3979] 104 3979 446441 304 304 0 0 368640 1984 0 podman [ 633.776648] [ 4002] 104 4002 502124 3983 3978 5 0 401408 0 0 podman [ 633.776651] [ 4022] 104 4022 2212 127 68 59 0 53248 0 0 conmon [ 633.776653] [ 4025] 104 4025 1015 98 64 34 0 57344 0 0 bash [ 633.776655] [ 4028] 104 4028 1069 50 32 18 0 45056 32 0 su [ 633.776657] [ 4029] 166536 4029 669 52 32 20 0 45056 0 0 wrapper.sh [ 633.776659] [ 4041] 166536 4041 754 36 0 36 0 40960 0 0 tee [ 633.776661] [ 4043] 166536 4043 754 34 0 34 0 45056 0 0 tee [ 633.776663] [ 4045] 166536 4045 669 37 0 37 0 49152 0 0 sh [ 633.776666] [ 4053] 166536 4053 20134547 11468505 11468495 10 0 94818304 234720 0 rocfft-test [ 633.776668] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=user.slice,mems_allowed=0,global_oom,task_memcg=/user.slice/user-104.slice/user@104.service/user.slice/libpod-577c53714068510df069335c5a4e99b966e187f291e087208c17df3ad5fdb52d.scope/container,task=rocfft-test,pid=4053,uid=166536 [ 633.776686] Out of memory: Killed process 4053 (rocfft-test) total-vm:80538188kB, anon-rss:45873980kB, file-rss:40kB, shmem-rss:0kB, UID:166536 pgtables:92596kB oom_score_adj:0 [ 633.779547] rocfft-test: page allocation failure: order:0, mode:0x140dca(GFP_HIGHUSER_MOVABLE|__GFP_COMP|__GFP_ZERO), nodemask=(null),cpuset=user.slice,mems_allowed=0 [ 633.779554] CPU: 7 PID: 4053 Comm: rocfft-test Not tainted 6.10.6+bpo-amd64 #1 Debian 6.10.6-1~bpo12+1 [ 633.779556] Hardware name: Micro Computer (HK) Tech Limited UM773 Lite/F7BFD, BIOS 1.06 02/27/2023 [ 633.779557] Call Trace: [ 633.779559] <TASK> [ 633.779560] dump_stack_lvl+0x64/0x80 [ 633.779563] warn_alloc+0x164/0x1e0 [ 633.779567] ? srso_alias_return_thunk+0x5/0xfbef5 [ 633.779570] __alloc_pages_slowpath.constprop.0+0xc7b/0xd60 [ 633.779575] __alloc_pages_noprof+0x309/0x340 [ 633.779578] alloc_pages_mpol_noprof+0xd9/0x1e0 [ 633.779582] vma_alloc_folio_noprof+0x65/0xd0 [ 633.779584] ? srso_alias_return_thunk+0x5/0xfbef5 [ 633.779586] do_anonymous_page+0x2b0/0x7b0 [ 633.779588] ? srso_alias_return_thunk+0x5/0xfbef5 [ 633.779590] ? __pte_offset_map+0x1b/0x180 [ 633.779593] __handle_mm_fault+0xc3e/0x1070 [ 633.779596] ? srso_alias_return_thunk+0x5/0xfbef5 [ 633.779600] handle_mm_fault+0x190/0x320 [ 633.779602] hmm_vma_fault.isra.0+0x4d/0x90 [ 633.779605] walk_pgd_range+0x34d/0xa90 [ 633.779609] ? srso_alias_return_thunk+0x5/0xfbef5 [ 633.779612] __walk_page_range+0x198/0x1b0 [ 633.779615] walk_page_range+0x13d/0x200 [ 633.779618] hmm_range_fault+0x5f/0xa0 [ 633.779621] amdgpu_hmm_range_get_pages+0x144/0x260 [amdgpu] [ 633.779791] amdgpu_ttm_tt_get_user_pages+0xc1/0x1a0 [amdgpu] [ 633.779912] amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu+0x50e/0xb40 [amdgpu] [ 633.780064] kfd_ioctl_alloc_memory_of_gpu+0xd5/0x270 [amdgpu] [ 633.780211] kfd_ioctl+0x3af/0x4c0 [amdgpu] [ 633.780352] ? __pfx_kfd_ioctl_alloc_memory_of_gpu+0x10/0x10 [amdgpu] [ 633.780495] __x64_sys_ioctl+0x97/0xd0 [ 633.780498] do_syscall_64+0x82/0x190 [ 633.780503] ? srso_alias_return_thunk+0x5/0xfbef5 [ 633.780505] ? vm_mmap_pgoff+0x131/0x1c0 [ 633.780508] ? srso_alias_return_thunk+0x5/0xfbef5 [ 633.780509] ? syscall_exit_to_user_mode+0x77/0x210 [ 633.780512] ? srso_alias_return_thunk+0x5/0xfbef5 [ 633.780513] ? do_syscall_64+0x8e/0x190 [ 633.780515] ? srso_alias_return_thunk+0x5/0xfbef5 [ 633.780517] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 633.780519] RIP: 0033:0x7fe8bf3164bb [ 633.780522] Code: Unable to access opcode bytes at 0x7fe8bf316491. [ 633.780523] RSP: 002b:00007ffd9f7cddb0 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ 633.780525] RAX: ffffffffffffffda RBX: 00000000c0000004 RCX: 00007fe8bf3164bb [ 633.780526] RDX: 00007ffd9f7cde50 RSI: 00000000c0284b16 RDI: 0000000000000003 [ 633.780527] RBP: 00007ffd9f7cde50 R08: 00007ffd9f7cdf48 R09: 00000000c0000004 [ 633.780528] R10: 0000000000004022 R11: 0000000000000246 R12: 00000000c0284b16 [ 633.780529] R13: 0000000000000003 R14: 00007ffd9f7cdf48 R15: 00007fe8c4a71278 [ 633.780533] </TASK> [ 633.780534] Mem-Info: [ 633.780536] active_anon:10825835 inactive_anon:665439 isolated_anon:0 active_file:10 inactive_file:309 isolated_file:0 unevictable:0 dirty:0 writeback:0 slab_reclaimable:7541 slab_unreclaimable:23829 mapped:47 shmem:294 pagetables:24401 sec_pagetables:1054 bounce:0 kernel_misc_reclaimable:0 free:34054 free_pcp:2971 free_cma:0 [ 633.780539] Node 0 active_anon:43303340kB inactive_anon:2661756kB active_file:40kB inactive_file:1236kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:188kB dirty:0kB writeback:0kB shmem:1176kB shmem_thp:0kB shmem_pmdmapped:0kB anon_thp:44812288kB writeback_tmp:0kB kernel_stack:5824kB pagetables:97604kB sec_pagetables:4216kB all_unreclaimable? no [ 633.780543] Node 0 DMA free:756kB boost:0kB min:16kB low:28kB high:40kB reserved_highatomic:0KB active_anon:96kB inactive_anon:10352kB active_file:28kB inactive_file:16kB unevictable:0kB writepending:0kB present:15996kB managed:15360kB mlocked:0kB bounce:0kB free_pcp:16kB local_pcp:8kB free_cma:0kB [ 633.780547] lowmem_reserve[]: 0 2725 61975 0 0 [ 633.780552] Node 0 DMA32 free:1424kB boost:0kB min:2968kB low:5756kB high:8544kB reserved_highatomic:0KB active_anon:0kB inactive_anon:238268kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:2895008kB managed:2829172kB mlocked:0kB bounce:0kB free_pcp:752kB local_pcp:752kB free_cma:0kB [ 633.780556] lowmem_reserve[]: 0 0 59249 0 0 [ 633.780560] Node 0 Normal free:134036kB boost:0kB min:64592kB low:125260kB high:185928kB reserved_highatomic:0KB active_anon:44073220kB inactive_anon:1643160kB active_file:0kB inactive_file:880kB unevictable:0kB writepending:0kB present:61836288kB managed:60679192kB mlocked:0kB bounce:0kB free_pcp:11364kB local_pcp:7572kB free_cma:0kB [ 633.780563] lowmem_reserve[]: 0 0 0 0 0 [ 633.780567] Node 0 DMA: 1*4kB (U) 0*8kB 14*16kB (U) 16*32kB (U) 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 740kB [ 633.780578] Node 0 DMA32: 1*4kB (M) 0*8kB 0*16kB 0*32kB 1*64kB (M) 1*128kB (M) 1*256kB (M) 0*512kB 1*1024kB (M) 0*2048kB 0*4096kB = 1476kB [ 633.780589] Node 0 Normal: 1*4kB (U) 1247*8kB (U) 2773*16kB (UE) 2442*32kB (UE) 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 132492kB [ 633.780601] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [ 633.780602] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [ 633.780603] 891 total pagecache pages [ 633.780604] 342 pages in swap cache [ 633.780605] Free swap = 0kB [ 633.780606] Total swap = 999420kB [ 633.780606] 16186823 pages RAM [ 633.780607] 0 pages HighMem/MovableOnly [ 633.780608] 305892 pages reserved [ 633.780609] 0 pages hwpoisoned [ 633.780633] amdgpu: init_user_pages: Failed to get user pages: -14