Re: [debian-hppa] poor performance of Debian 11 HPPA with qemu-system-hppa
Hi Nelson,
On 8/14/21 4:35 PM, Nelson H. F. Beebe wrote:
In a previous message to the debian-hppa list today, I described how I
finally got a virtual machine successfully created for running Debian
11 on HPPA (aka PA-RISC).
On the same host
Dell Precision 7920 (1 16-core CPU, 32 hyperthreads,
2200MHz Intel Xeon Platinum 8253,
384GB DDR-4 RAM);
Ubuntu 20.04.02 LTS (Focal Fossa)
I have VMs running with QEMU emulation for Alpha, ARM64, M68K, MIPS32,
MIPS64, RISC-V64, S390x, and SPARC64, and most of them have quite
reasonable interactive performance, making it possible to use the
emacs editor in terminal windows and X11 windows without any serious
response problems.
However, for the new Debian 11 HPPA VM, interactive performance is a
huge issue: shell typein sometimes gets immediate character echo, but
frequently gets delays of 10 to 30 seconds for each input character.
That makes it extremely hard for a fast typist to type commands and
text: one is never sure whether input keys have been dropped.
I haven't see this yet.
I develop mathematical software, and a large package that I'm writing
for multiple precision arithmetic provides a testbed for evaluating VM
performance. Most of the QEMU CPU types support multiple processors,
but M68K and SPARC64 sun4u only permit one CPU. For HPPA, I have 4 CPUs
and 3GB DRAM; the latter is a hard limit imposed in QEMU source code.
Yes, 3GB (actually 3,5GB) is max for 32bit hppa systems.
If you run with 4 emulated CPUs, make to sure to add:
-accel tcg,thread=multi
when starting qemu.
Here is a table of running the equivalent of
date; make all check ; date
on these systems, using QEMU-6.0.0, unless noted. Both compilations
and test programs are run in parallel, by internal "make -j" commands.
make timing (wall clock)
Debian 11 Alpha 07:43:16 -- 08:23:05 39m 49s
Debian 11 ARM64 07:58:02 -- 08:24:45 26m 43s
Debian 11 M68K 07:43:15 -- 08:30:56 47m 41s
Debian 11 HPPA 13:23:16 -- 21:40:19 497m 03s
Debian 11 HPPA 07:29:18 -- 18:07:19 638m 01s [qemu-6.1.0-rc3]
NetBSD 9.2 HPPA 11:22:10 -- 01:25:46 843m 36s
It would be interesting to see the performance on hppa on real hardware.
If needed I can give you access to a physical machine to test. Just let me know.
From comparison, here are results on native hardware with local disk
(not NFS, unless indicated) [clock speed in GHz is abbreviated to G]:
ArchLinux ARM32 09:57:34 -- 10:07:43 10m 09s
Debian 11 UltraSparc T2 08:30:54 -- 08:41:18 10m 24s
Solaris 10 UltraSparc T2 09:46:31 -- 09:59:32 13m 01s
Ubuntu 20.04 Xeon 8253 09:34:52 -- 09:35:36 0m 44s
CentOS 7.9 Xeon E6-1600v3 09:39:00 -- 09:39:42 0m 42s
CentOS 7.9 Xeon E6-1600v3 10:42:43 -- 10:43:30 0m 47s [NFS]
CentOS 7.9 EPYC 7502 2.0G 64C/128T 10:02:01 -- 10:02:27 0m 26s
CentOS 7.9 EPYC 7502 2.5G 32C/64T 10:02:00 -- 10:02:25 0m 25s
The tests produce about 62,000 total lines of text output, spread over
about 180 files. They read no input data, and are primarily compute
bound in loops with integer, not floating-point, arithmetic, using
32-bit and 64-bit integer types.
I have generated machine language for representative code from the
hotspot loop using the -S option of gcc and clang, and found that
64-bit arithmetic is expanded inline with 32-bit instructions on
ARM32, HPPA, and M68K, none of which have 64-bit arithmetic
instructions. The loop instruction counts are comparable across all
of those systems, typically 10 to 20 instructions, compared to 5 or so
on those CPUs that have 64-bit arithmetic.
The dramatic slowdowns on HPPA and SPARC64 are a big surprise, but the
HPPA slowdown matches the poor interactive response. The SPARC64 VM
is much more responsive interactively, and it DOES have 64-bit integer
arithmetic.
I have not yet done profiling builds of qemu-system-hppa and
qemu-system-sparc64, but that remains an option for further
investigation to find out what is responsible for the slowness.
It would be good if you find some time for further analysis.
I can also do profiling builds of parts of my test suite to see
whether there are unexpected hotspots on HPPA and SPARC64 that are
absent on other CPU types.
I have physical SPARC64 hardware running Debian 11 and Solaris 10 on
identical boxes, and have done builds of TeX Live on them with no
difficulty. However, the slow speed of QEMU HPPA makes it impractical
to try TeX Live builds for Debian 11 HPPA, which is disappointing.
Does any list member have any idea of why QEMU emulation of HPPA and
SPARC64 is so bad? Are there Debian kernel parameters that might be
tweaked? Have any of you used Debian on QEMU HPPA and seen similar
slowness compared to other CPU types?
Again, I'd like to compare qemu-emulated hppa to physical hppa performance
to rule out any qemu slowliness.
Notice from my first table above that NetBSD 9.2 on HPPA is also very
slow, which tends to point the finger at QEMU as the source of the
dismal performance, rather than the VM guest O/S.
For the record, here is how QEMU releases downloaded from
https://www.qemu.org/
https://download.qemu.org/
are built here, taking the most recent QEMU release for the sample:
tar xf $prefix/src/qemu/qemu-6.1.0-rc3.tar.xz
cd qemu-6.1.0-rc3
unsetenv CONFIG_SITE
mkdir build
cd build
env CC=cc CFLAGS=-O2 ../configure --prefix=$prefix && make all -j && make check
You should make sure to disable debugging when building.
Those are mine configure options:
'--target-list=hppa-softmmu' '--enable-numa' '--disable-mpath' '--disable-spice' '--disable-opengl' '--disable-sanitizers' --disable-docs --disable-debug-mutex --disable-debug-tcg --disable-containers --disable-pie --disable-qom-cast-debug --disable-debug-mutex
QEMU builds require prior installation of the ninja-build package
available on major GNU/Linux distributions. On completion, the needed
qemu-system-xxx executables are present in the build subdirectory.
On Ubuntu 20.04, the QEMU builds are clean, and pass the entire
validation suite without any failures.
Helge
Reply to: