Re: Bits from /me: A humble draft policy on "deep learning v.s. freedom"
Hi PICCA,
On 2019-05-24 12:01, PICCA Frederic-Emmanuel wrote:
> What about ibm power9 with pocl ?
>
> it seems that this is better than the latest NVIDIA GPU.
The typical workload for training neural networks is linear
operations such as general matrix-matrix multiplication and
convolution.
I know nothing about pocl, but it's hard for CPU to beat
GPU in terms of these highly-parallelizable linear operations.
Try a 4096x4096 multiplication and you will easily find out
the difference.
E.g. my CPU = I5 7440HQ (middle-end mobile CPU), GPU = Nvidia 940MX
(junk)
The junk GPU (CUDA) is 100x faster than my CPU (MKL).
~ ❯❯❯ optirun ipython3
Python 3.7.3 (default, Apr 3 2019, 05:39:12)
Type 'copyright', 'credits' or 'license' for more information
IPython 7.2.0 -- An enhanced Interactive Python. Type '?' for help.
In [1]: import torch as th
In [2]: x = th.rand(4096, 4096)
In [3]: %time x@x
CPU times: user 1.65 s, sys: 38.7 ms, total: 1.69 s
Wall time: 449 ms
Out[3]:
tensor([[1015.7596, 1004.2767, 1001.6245, ..., 1026.8447, 996.3105,
1002.7847],
[1047.8833, 1014.3856, 1020.8246, ..., 1055.3224, 1021.6126,
1031.0334],
[1049.3168, 1027.7637, 1030.9961, ..., 1054.3218, 1015.3804,
1031.6709],
...,
[1039.6516, 1024.6678, 1021.1326, ..., 1047.0674, 1015.1402,
1029.5969],
[1020.1988, 994.0073, 1005.5823, ..., 1015.6786, 990.2491,
1008.1358],
[1022.9388, 991.9886, 990.4608, ..., 1013.9000, 998.8676,
1007.8554]])
In [4]: x = x.cuda()
In [5]: %time x@x
CPU times: user 1.1 ms, sys: 174 µs, total: 1.27 ms
Wall time: 2.67 ms
Out[5]:
tensor([[1015.7591, 1004.2764, 1001.6254, ..., 1026.8447, 996.3105,
1002.7841],
[1047.8838, 1014.3846, 1020.8243, ..., 1055.3209, 1021.6123,
1031.0328],
[1049.3174, 1027.7644, 1030.9971, ..., 1054.3210, 1015.3800,
1031.6727],
...,
[1039.6511, 1024.6686, 1021.1323, ..., 1047.0674, 1015.1404,
1029.5974],
[1020.1982, 994.0067, 1005.5826, ..., 1015.6784, 990.2482,
1008.1347],
[1022.9395, 991.9879, 990.4588, ..., 1013.9014, 998.8687,
1007.8544]], device='cuda:0')
Reply to: