[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: hardware encryption



On Thu, Jun 3, 2021 at 10:50 AM Diederik de Haas <didi.debian@cknow.org> wrote:
>
> On woensdag 20 januari 2021 11:40:26 CEST brainfart@posteo.net wrote:
> > hardware accelerated encryption is a bit of a mystery to me
> > some processors advertise it but how do we know if it's being used
> > is there a way to test if hardware accelerated encryption is being used
> > or if it's just advertising hype
>
> I very much like to understand this as well.
> I have a/several Rock64 devices and it is supposed to have ARMv8 Cryptography
> Extensions according to https://wiki.pine64.org/wiki/ROCK64#CPU_Architecture.
>
> Due to bug #976635  several CRYPTO modules got enabled in the 5.10 kernel.
> But I don't know whether that's relevant for ARMv8 CE.
>
> https://turecki.net/content/getting-most-out-ssh-hardware-acceleration-tuning-aes-ni
> contains a test to check the speed of some crypto operations.
> Based on that I've made a procedure which I've now run on several devices:
>
> # adduser test
> $ ssh-add (make sure ssh agent is running)
> $ ssh-copy-id test@localhost
> $ ssh test@localhost (verify key based auth works)
> $ exit
> $ for i in `ssh -Q cipher`; do dd if=/dev/zero bs=1M count=100 2> /dev/null | \
> ssh -c $i test@localhost "(time -p cat) > /dev/null" 2>&1 | grep real | \
> awk '{print "'$i': "100 / $2" MB/s" }'; done
> $ grep -i -E "(flags|features)" /proc/cpuinfo | tail -n1
>
> On a Rock64 with kernel 5.8.0-1-arm64, I got these results:
> aes128-ctr: 45.8716 MB/s
> aes192-ctr: 45.6621 MB/s
> aes256-ctr: 44.6429 MB/s
> aes128-gcm@openssh.com: 49.505 MB/s
> aes256-gcm@openssh.com: 48.7805 MB/s
> chacha20-poly1305@openssh.com: 36.9004 MB/s
>
> Features        : fp asimd evtstrm aes pmull sha1 sha2 crc32 cpuid
>
> But on kernel 5.10.0-7-arm64, with those CRYPTO modules, I got this:
> aes128-ctr: 42,735 MB/s
> aes192-ctr: 44,4444 MB/s
> aes256-ctr: 44,0529 MB/s
> aes128-gcm@openssh.com: 48,0769 MB/s
> aes256-gcm@openssh.com: 46,0829 MB/s
> chacha20-poly1305@openssh.com: 37,037 MB/s
>
> Features        : fp asimd evtstrm aes pmull sha1 sha2 crc32 cpuid
>
> If you run the test several times, you'll get slightly different results
> each time, so I consider these results the same.
>
> For comparison (I don't remember which kernel version) on Ryzen 7 1800X:
> aes128-ctr: 714.286 MB/s
> aes192-ctr: 714.286 MB/s
> aes256-ctr: 769.231 MB/s
> aes128-gcm@openssh.com: 1000 MB/s
> aes256-gcm@openssh.com: 1000 MB/s
> chacha20-poly1305@openssh.com: 294.118 MB/s
>
> flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat
> pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp
> lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf pni
> pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx
> f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse
> 3dnowprefetch osvw skinit wdt tce topoext perfctr_core perfctr_nb bpext
> perfctr_llc mwaitx cpb hw_pstate sme ssbd sev ibpb vmmcall fsgsbase bmi1
> avx2 smep bmi2 rdseed adx smap clflushopt sha_ni xsaveopt xsavec xgetbv1
> xsaves clzero irperf xsaveerptr arat npt lbrv svm_lock nrip_save tsc_scale
> vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic
> v_vmsave_vmload vgif overflow_recov succor smca
>
> with kernel 5.10.0-7-amd64:
> aes128-ctr: 714,286 MB/s
> aes192-ctr: 769,231 MB/s
> aes256-ctr: 714,286 MB/s
> aes128-gcm@openssh.com: 909,091 MB/s
> aes256-gcm@openssh.com: 909,091 MB/s
> chacha20-poly1305@openssh.com: 500 MB/s
>
> very odd that aes192-ctr and aes256-ctr seem to have switched, but the values
> are otherwise EXACTLY the same :-O
> Very impressive speed improvement with chacha20-poly1305 though :D
> (Note that the aforementioned bug report was about arm64, not amd64)
>
> On a RPi2, the values were around 12 MB/s
>
>
> I don't find the scores of the Rock64 impressive, but that may be because
> I've read somewhere that ARMv8 Cryptography Extensions could/should
> result in a FACTOR 10 speed improvements with cryptography.
>
> There could be a number of issues here:
> 1) The 'factor 10' is horseshit
> 2) The 'factor 10' is true, but it doesn't work on Rock64 (yet?)
> 3) The 'factor 10' is true and working and without it, the scores would be abysmal.
> 4) The test is all wrong
>
> If I do 'cat /proc/crypto' I get a long list, but I have no idea what the output means.
>
>
> So essentially I have the same question as OP.
> How can I/we know if it's present and working as intended?
> What kind of speed improvement can/should one expect?
> What is needed to take advantage of it? Kernel modules and if so, which?
> The CRYPTO_XYZ_CE ones? Others? Something else entirely?


I _think_ OpenSSH uses OpenSSL, not kernel crypto. Or they use that
LibreSSL port of OpenSSL.

To benchmark OpenSSL, you use something like:

    # C implementation
    openssl speed aes-128-cbc

    # Hardware acceleration
    openssl speed -evp aes-128-cbc

You can see the difference in the numbers below. Below, I'm on a Core i7-8700.

$ openssl speed aes-128-cbc
Doing aes-128 cbc for 3s on 16 size blocks: 57736814 aes-128 cbc's in 3.00s
Doing aes-128 cbc for 3s on 64 size blocks: 14943316 aes-128 cbc's in 3.00s
Doing aes-128 cbc for 3s on 256 size blocks: 3741357 aes-128 cbc's in 3.00s
Doing aes-128 cbc for 3s on 1024 size blocks: 944345 aes-128 cbc's in 3.00s
Doing aes-128 cbc for 3s on 8192 size blocks: 118246 aes-128 cbc's in 3.00s
Doing aes-128 cbc for 3s on 16384 size blocks: 59132 aes-128 cbc's in 3.00s
OpenSSL 1.1.1f  31 Mar 2020
built on: Wed Apr 28 00:37:28 2021 UTC
...
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes
8192 bytes  16384 bytes
aes-128 cbc     307929.67k   318790.74k   319262.46k   322336.43k
322890.41k   322939.56k

$ openssl speed -evp aes-128-cbc
Doing aes-128-cbc for 3s on 16 size blocks: 186837731 aes-128-cbc's in 2.99s
Doing aes-128-cbc for 3s on 64 size blocks: 78857865 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 256 size blocks: 20276035 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 1024 size blocks: 5088201 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 8192 size blocks: 636732 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 16384 size blocks: 318374 aes-128-cbc's in 3.00s
OpenSSL 1.1.1f  31 Mar 2020
built on: Wed Apr 28 00:37:28 2021 UTC
...
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes
8192 bytes  16384 bytes
aes-128-cbc     999800.57k  1682301.12k  1730221.65k  1736772.61k
1738702.85k  1738746.54k

I don't like OpenSSL output. They should provide Cycle-per-byte (cpb)
since it is mostly independent as a metric when measuring performance.
Jeff


Reply to: