Re: More progress to report [Re: Debian Bullseye on Raspberry Pi 4 4GB?]
On Wed, Mar 3, 2021 at 9:44 AM LinAdmin <linadmin@quickline.ch> wrote:
>
> The common believe that on the same hardware 64-bit must be better or equal to 32-bit is clearly wrong for the "crazy" BCM2711 chip used in Pi4.
> The detailed benchmarks for Raspian Buster are at 32 Bit Kernel 4.19 and 64 Bit Kernel 5.4. showing for calculation AES 16KB 50% less throughput for 64-bit.
This is a user space microbenchmark, it has nothing to do with what the
kernel does underneath it.
Looking at the output, I see it's not even running the same version of
the program:
Test on 32-bit kernel:
OpenSSL 1.1.1c, built on 28 May 2019
type 16 bytes 64 bytes 256 bytes 1024 bytes
8192 bytes 16384 bytes
aes-128-cbc 62184.51k 76615.98k 83103.15k 84435.97k
85237.76k 85169.49k
aes-128-cbc 62511.68k 76704.43k 83097.09k 84763.99k
85150.38k 85229.57k
aes-192-cbc 50203.94k 64933.31k 71396.52k 73090.39k
73602.39k 73706.15k
aes-192-cbc 56285.24k 67498.65k 71976.02k 73356.29k
73525.93k 73258.33k
aes-256-cbc 51010.29k 60062.42k 63579.31k 64656.73k
64927.06k 64831.49k
aes-256-cbc 50869.32k 60057.64k 63678.55k 64560.47k
64935.25k 64891.56k
Test on 64-bit kernel:
OpenSSL 1.1.1d, built on 10 Sep 2019
type 16 bytes 64 bytes 256 bytes 1024 bytes
8192 bytes 16384 bytes
aes-128-cbc 38070.54k 40669.85k 41716.22k 42029.40k
42131.46k 42177.88k
aes-128-cbc 38065.38k 40746.26k 41775.96k 42064.21k
42229.76k 42292.57k
aes-192-cbc 32294.31k 34105.22k 35048.28k 35303.42k
35351.21k 35351.21k
aes-192-cbc 32254.74k 34136.98k 35043.33k 35301.38k
35367.59k 35367.59k
aes-256-cbc 27986.06k 29351.96k 29962.33k 30127.79k
30173.87k 30179.33k
aes-256-cbc 27986.74k 29372.25k 29969.24k 30119.25k
30160.21k 30157.48k
> On my system I get similar results e.g. for AES-128 (16KB):
> Salsa Buster arm64 5.9.0 42'000
> Ubuntu LTS armv7l 5.4 92'000
Do you mean you are running the openssl benchmarks from two
different distros here? Could it be that you are running a 64-bit openssl
binary on the Buster arm64 kernel?
If you want to compare the kernel performance, you have to ensure that
you are running the exact same user space on both. For the openssl
test, it should be sufficient to boot the Buster installation and enter
a chroot.
As you can see in the two listings you sent, the 32-bit version reports
the 'neon' feature, while the 64-bit version reports 'asimd', which is
what 64-bit user space expects, so either those tests are running
64-bit user space, or the 32-bit user space is running on the wrong
'personality' of the kernel.
It's possible that the feature detection in openssl fails when you run
in the wrong personality, as the /proc/cpuinfo output will contain
incompatible information. When you use 'sudo linux32 chroot /mnt/ubuntu-armv7'
to enter the chroot, that chroot should be in the correct personality.
> When playing a FullHD video coded H265, the average CPU load is 80% on 64-bit and
> less than 30% on 32-bit! > Similar situations when encoding to H265 using ffmpeg .
This could be the same problem with incorrect feature detection from
running the wrong personality, or it could be related to missing kernel
drivers for H265 acceleration in the 64-bit kernel. Do you know if this
uses a software codec or an accelerated version in the GPU?
Arnd
Reply to: