[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: [debian-hppa] poor performance of Debian 11 HPPA with qemu-system-hppa



Hi Nelson,

On 8/14/21 4:35 PM, Nelson H. F. Beebe wrote:
In a previous message to the debian-hppa list today, I described how I
finally got a virtual machine successfully created for running Debian
11 on HPPA (aka PA-RISC).

On the same host

	Dell Precision 7920 (1 16-core CPU, 32 hyperthreads,
	2200MHz Intel Xeon Platinum 8253,
	384GB DDR-4 RAM);
	Ubuntu 20.04.02 LTS (Focal Fossa)

I have VMs running with QEMU emulation for Alpha, ARM64, M68K, MIPS32,
MIPS64, RISC-V64, S390x, and SPARC64, and most of them have quite
reasonable interactive performance, making it possible to use the
emacs editor in terminal windows and X11 windows without any serious
response problems.

However, for the new Debian 11 HPPA VM, interactive performance is a
huge issue: shell typein sometimes gets immediate character echo, but
frequently gets delays of 10 to 30 seconds for each input character.
That makes it extremely hard for a fast typist to type commands and
text: one is never sure whether input keys have been dropped.

I haven't see this yet.

I develop mathematical software, and a large package that I'm writing
for multiple precision arithmetic provides a testbed for evaluating VM
performance.  Most of the QEMU CPU types support multiple processors,
but M68K and SPARC64 sun4u only permit one CPU.  For HPPA, I have 4 CPUs
and 3GB DRAM; the latter is a hard limit imposed in QEMU source code.

Yes, 3GB (actually 3,5GB) is max for 32bit hppa systems.

If you run with 4 emulated CPUs, make to sure to add:
-accel tcg,thread=multi
when starting qemu.

Here is a table of running the equivalent of

	date; make all check ; date

on these systems, using QEMU-6.0.0, unless noted.  Both compilations
and test programs are run in parallel, by internal "make -j" commands.

				make timing (wall clock)

	Debian 11	Alpha			07:43:16 -- 08:23:05	 39m 49s
	Debian 11	ARM64			07:58:02 -- 08:24:45	 26m 43s
	Debian 11	M68K			07:43:15 -- 08:30:56	 47m 41s
	Debian 11	HPPA			13:23:16 -- 21:40:19	497m 03s
	Debian 11	HPPA			07:29:18 -- 18:07:19	638m 01s [qemu-6.1.0-rc3]
	NetBSD 9.2	HPPA			11:22:10 -- 01:25:46	843m 36s

It would be interesting to see the performance on hppa on real hardware.
If needed I can give you access to a physical machine to test. Just let me know.

From comparison, here are results on native hardware with local disk
(not NFS, unless indicated) [clock speed in GHz is abbreviated to G]:

	ArchLinux	ARM32			09:57:34 -- 10:07:43	 10m 09s
	Debian 11	UltraSparc T2		08:30:54 -- 08:41:18	 10m 24s
	Solaris 10	UltraSparc T2		09:46:31 -- 09:59:32	 13m 01s
	Ubuntu 20.04	Xeon 8253		09:34:52 -- 09:35:36	  0m 44s
	CentOS 7.9	Xeon E6-1600v3		09:39:00 -- 09:39:42	  0m 42s
	CentOS 7.9	Xeon E6-1600v3		10:42:43 -- 10:43:30	  0m 47s [NFS]
	CentOS 7.9	EPYC 7502 2.0G 64C/128T	10:02:01 -- 10:02:27	  0m 26s
	CentOS 7.9	EPYC 7502 2.5G 32C/64T	10:02:00 -- 10:02:25	  0m 25s

The tests produce about 62,000 total lines of text output, spread over
about 180 files.  They read no input data, and are primarily compute
bound in loops with integer, not floating-point, arithmetic, using
32-bit and 64-bit integer types.

I have generated machine language for representative code from the
hotspot loop using the -S option of gcc and clang, and found that
64-bit arithmetic is expanded inline with 32-bit instructions on
ARM32, HPPA, and M68K, none of which have 64-bit arithmetic
instructions.  The loop instruction counts are comparable across all
of those systems, typically 10 to 20 instructions, compared to 5 or so
on those CPUs that have 64-bit arithmetic.

The dramatic slowdowns on HPPA and SPARC64 are a big surprise, but the
HPPA slowdown matches the poor interactive response.  The SPARC64 VM
is much more responsive interactively, and it DOES have 64-bit integer
arithmetic.

I have not yet done profiling builds of qemu-system-hppa and
qemu-system-sparc64, but that remains an option for further
investigation to find out what is responsible for the slowness.

It would be good if you find some time for further analysis.

I can also do profiling builds of parts of my test suite to see
whether there are unexpected hotspots on HPPA and SPARC64 that are
absent on other CPU types.

I have physical SPARC64 hardware running Debian 11 and Solaris 10 on
identical boxes, and have done builds of TeX Live on them with no
difficulty.  However, the slow speed of QEMU HPPA makes it impractical
to try TeX Live builds for Debian 11 HPPA, which is disappointing.

Does any list member have any idea of why QEMU emulation of HPPA and
SPARC64 is so bad?  Are there Debian kernel parameters that might be
tweaked?  Have any of you used Debian on QEMU HPPA and seen similar
slowness compared to other CPU types?

Again, I'd like to compare qemu-emulated hppa to physical hppa performance
to rule out any qemu slowliness.

Notice from my first table above that NetBSD 9.2 on HPPA is also very
slow, which tends to point the finger at QEMU as the source of the
dismal performance, rather than the VM guest O/S.

For the record, here is how QEMU releases downloaded from

	https://www.qemu.org/
	https://download.qemu.org/

are built here, taking the most recent QEMU release for the sample:

	tar xf $prefix/src/qemu/qemu-6.1.0-rc3.tar.xz
	cd qemu-6.1.0-rc3
	unsetenv CONFIG_SITE
	mkdir build
	cd build
	env CC=cc CFLAGS=-O2 ../configure --prefix=$prefix && make all -j && make check

You should make sure to disable debugging when building.
Those are mine configure options:
'--target-list=hppa-softmmu' '--enable-numa' '--disable-mpath' '--disable-spice' '--disable-opengl' '--disable-sanitizers' --disable-docs --disable-debug-mutex --disable-debug-tcg  --disable-containers --disable-pie --disable-qom-cast-debug --disable-debug-mutex


QEMU builds require prior installation of the ninja-build package
available on major GNU/Linux distributions.  On completion, the needed
qemu-system-xxx executables are present in the build subdirectory.

On Ubuntu 20.04, the QEMU builds are clean, and pass the entire
validation suite without any failures.

Helge


Reply to: