[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: More progress to report [Re: Debian Bullseye on Raspberry Pi 4 4GB?]



On Wed, Mar 3, 2021 at 9:44 AM LinAdmin <linadmin@quickline.ch> wrote:
>
> The common believe that on the same hardware 64-bit must be better or equal to 32-bit is clearly wrong for the "crazy" BCM2711 chip used in Pi4.
> The detailed benchmarks for Raspian Buster are at 32 Bit Kernel 4.19 and 64 Bit Kernel 5.4. showing for calculation AES 16KB  50% less throughput for 64-bit.

This is a user space microbenchmark, it has nothing to do with what the
kernel does underneath it.

Looking at the output, I see it's not even running the same version of
the program:

Test on 32-bit kernel:
OpenSSL 1.1.1c, built on 28 May 2019
type             16 bytes     64 bytes    256 bytes   1024 bytes
8192 bytes  16384 bytes
aes-128-cbc      62184.51k    76615.98k    83103.15k    84435.97k
85237.76k    85169.49k
aes-128-cbc      62511.68k    76704.43k    83097.09k    84763.99k
85150.38k    85229.57k
aes-192-cbc      50203.94k    64933.31k    71396.52k    73090.39k
73602.39k    73706.15k
aes-192-cbc      56285.24k    67498.65k    71976.02k    73356.29k
73525.93k    73258.33k
aes-256-cbc      51010.29k    60062.42k    63579.31k    64656.73k
64927.06k    64831.49k
aes-256-cbc      50869.32k    60057.64k    63678.55k    64560.47k
64935.25k    64891.56k

Test on 64-bit kernel:
OpenSSL 1.1.1d, built on 10 Sep 2019
type             16 bytes     64 bytes    256 bytes   1024 bytes
8192 bytes  16384 bytes
aes-128-cbc      38070.54k    40669.85k    41716.22k    42029.40k
42131.46k    42177.88k
aes-128-cbc      38065.38k    40746.26k    41775.96k    42064.21k
42229.76k    42292.57k
aes-192-cbc      32294.31k    34105.22k    35048.28k    35303.42k
35351.21k    35351.21k
aes-192-cbc      32254.74k    34136.98k    35043.33k    35301.38k
35367.59k    35367.59k
aes-256-cbc      27986.06k    29351.96k    29962.33k    30127.79k
30173.87k    30179.33k
aes-256-cbc      27986.74k    29372.25k    29969.24k    30119.25k
30160.21k    30157.48k

> On my system I get similar results e.g. for AES-128 (16KB):
>     Salsa Buster arm64     5.9.0   42'000
>     Ubuntu LTS armv7l      5.4      92'000

Do you mean you are running the openssl benchmarks from two
different distros here? Could it be that you are running a 64-bit openssl
binary on the Buster arm64 kernel?

If you want to compare the kernel performance, you have to ensure that
you are running the exact same user space on both. For the openssl
test, it should be sufficient to boot the Buster installation and enter
a chroot.

As you can see in the two listings you sent, the 32-bit version reports
the 'neon' feature, while the 64-bit version reports 'asimd', which is
what 64-bit user space expects, so either those tests are running
64-bit user space, or the 32-bit user space is running on the wrong
'personality' of the kernel.

It's possible that the feature detection in openssl fails when you run
in the wrong personality, as the /proc/cpuinfo output will contain
incompatible information. When you use 'sudo linux32 chroot /mnt/ubuntu-armv7'
to enter the chroot, that chroot should be in the correct personality.

> When playing a FullHD video coded H265, the average CPU load is 80% on 64-bit and
> less than 30% on 32-bit! > Similar situations when encoding to H265 using ffmpeg .

This could be the same problem with incorrect feature detection from
running the wrong personality, or it could be related to missing kernel
drivers for H265 acceleration in the 64-bit kernel. Do you know if this
uses a software codec or an accelerated version in the GPU?

        Arnd


Reply to: