[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: OpenCL with Radeon GPU



Hi all,

a bit of research resulted in me finding this (old) page, which indicated there were some environment variables I could use to control memory allocation of AMD's OpenCL implementation:

https://community.amd.com/t5/drivers-software/solved-clinfo-reports-error-33-of-quot-global-free-memory-amd/td-p/172760

So I decided to give that a try.

First step: Ensure I can run an OpenCL program from the shell. I used the Primegrid binary for that:

# /var/lib/boinc-client/projects/www.primegrid.com/genefer22g_linux64_22.12.02 -h
geneferg version 22.12.2 (linux x64, gcc-7.5.0, boinc-7.20.2)
Copyright (c) 2022, Yves Gallot
genefer is free source code, under the MIT license.

Command line: '-h'

Running on device 'gfx1031', vendor 'Advanced Micro Devices, Inc.', version 'OpenCL 2.0 ', driver '3513.0 (HSA1.1,LC)'.
330000000^{2^15} + 1: 00:00:35, 0.0386 ms/bit, data size: 1.12 MB.
200000000^{2^16} + 1: 00:01:28, 0.0489 ms/bit, data size: 2.25 MB.
120000000^{2^17} + 1: 00:04:35, 0.0784 ms/bit, data size: 4.5 MB.
18000000^{2^18} + 1: 00:14:01, 0.133 ms/bit, data size: 9 MB.
5500000^{2^19} + 1: 00:49:15, 0.252 ms/bit, data size: 18 MB.
2000000^{2^20} + 1: 01:58:46, 0.325 ms/bit, data size: 24 MB.
910000^{2^21} + 1: 07:04:18, 0.613 ms/bit, data size: 48 MB.
270000^{2^22} + 1: 25:34:55, 1.22 ms/bit, data size: 96 MB.
1000000^{2^22} + 1: 30:51:50, 1.33 ms/bit, data size: 96 MB.
500000^{2^23} + 1: 116:24:31, 2.64 ms/bit, data size: 192 MB.


So that worked.

It also did not cause an error, but of course, test data != real workload, right?

Next step Ensure I can reproduce the error:

# /var/lib/boinc-client/projects/www.primegrid.com/genefer22g_linux64_22.12.02 -p -n 22 -b 1053460 -f gproof
geneferg version 22.12.2 (linux x64, gcc-7.5.0, boinc-7.20.2)
Copyright (c) 2022, Yves Gallot
genefer is free source code, under the MIT license.

Command line: '-p -n 22 -b 1053460 -f gproof'

Running on device 'gfx1031', vendor 'Advanced Micro Devices, Inc.', version 'OpenCL 2.0 ', driver '3513.0 (HSA1.1,LC)', data size: 96 MB.
0.0202% done, 28:15:37 remaining, 1.21 ms/bit.


Interesting, this seems to work without problem. Right now, I'm at

7.29% done, 26:21:57 remaining, 1.22 ms/bit.


which is much longer than what I've seen before.

My conclusion for now: The boinc service must have some limits set.

systemctl show gives me, among others:

LimitCPU=infinity
LimitCPUSoft=infinity
LimitFSIZE=infinity
LimitFSIZESoft=infinity
LimitDATA=infinity
LimitDATASoft=infinity
LimitSTACK=infinity
LimitSTACKSoft=8388608
LimitCORE=infinity
LimitCORESoft=0
LimitRSS=infinity
LimitRSSSoft=infinity
LimitNOFILE=524288
LimitNOFILESoft=1024
LimitAS=infinity
LimitASSoft=infinity
LimitNPROC=253399
LimitNPROCSoft=253399
LimitMEMLOCK=8388608
LimitMEMLOCKSoft=8388608
LimitLOCKS=infinity
LimitLOCKSSoft=infinity
LimitSIGPENDING=253399
LimitSIGPENDINGSoft=253399
LimitMSGQUEUE=819200
LimitMSGQUEUESoft=819200
LimitNICE=0
LimitNICESoft=0
LimitRTPRIO=0
LimitRTPRIOSoft=0
LimitRTTIME=infinity
LimitRTTIMESoft=infinity


My first candidate would be LimitMEMLOCK as I suspect that, for interaction between GPU and CPU, shared and locked would be a likely way. (You notice I know nearly nothing of OpenCL...)


I do know how 'systemctl edit' works, though, and set the limit to 1 GB:

LimitMEMLOCK=1073741824
LimitMEMLOCKSoft=1073741824

Which did not help, same error after similar time.

Still, seeing that I can run the binary in question from the shell, I'm kind of confident this should be solvable via proper unit configuration.


Which leaves me with one question for this mail thread: Can anybody recommend a test program for OpenCL functionality?


Thanks,

Arno

--
Arno Lehmann

IT-Service Lehmann
Sandstr. 6, 49080 Osnabrück


Reply to: