[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: [debian-hppa] poor performance of Debian 11 HPPA with qemu-system-hppa



I was wrong.  Pasta is configured for 8 cpus.

Dave

On 2021-08-14 1:31 p.m., John David Anglin wrote:
> Hi Nelson,
>
> Helge Deller is the expert on this and you likely will have to wait until he returns from vacation
> for an answer.  I think the pasta buildd running hppa emulation is configured for one cpu although
> I could be wrong.  Performance is a little slower than a real 800 MHz PA8800 machine.
>
> Some profiling likely would be helpful.
>
> Dave
>
> On 2021-08-14 10:35 a.m., Nelson H. F. Beebe wrote:
>> In a previous message to the debian-hppa list today, I described how I
>> finally got a virtual machine successfully created for running Debian
>> 11 on HPPA (aka PA-RISC).
>>
>> On the same host
>>
>> 	Dell Precision 7920 (1 16-core CPU, 32 hyperthreads,
>> 	2200MHz Intel Xeon Platinum 8253,
>> 	384GB DDR-4 RAM);
>> 	Ubuntu 20.04.02 LTS (Focal Fossa)
>>
>> I have VMs running with QEMU emulation for Alpha, ARM64, M68K, MIPS32,
>> MIPS64, RISC-V64, S390x, and SPARC64, and most of them have quite
>> reasonable interactive performance, making it possible to use the
>> emacs editor in terminal windows and X11 windows without any serious
>> response problems.
>>
>> However, for the new Debian 11 HPPA VM, interactive performance is a
>> huge issue: shell typein sometimes gets immediate character echo, but
>> frequently gets delays of 10 to 30 seconds for each input character.
>> That makes it extremely hard for a fast typist to type commands and
>> text: one is never sure whether input keys have been dropped.
>>
>> I develop mathematical software, and a large package that I'm writing
>> for multiple precision arithmetic provides a testbed for evaluating VM
>> performance.  Most of the QEMU CPU types support multiple processors,
>> but M68K and SPARC64 sun4u only permit one CPU.  For HPPA, I have 4 CPUs
>> and 3GB DRAM; the latter is a hard limit imposed in QEMU source code.
>>
>> Here is a table of running the equivalent of
>>
>> 	date; make all check ; date
>>
>> on these systems, using QEMU-6.0.0, unless noted.  Both compilations
>> and test programs are run in parallel, by internal "make -j" commands.
>>
>> 				make timing (wall clock)
>>
>> 	Debian 11	Alpha			07:43:16 -- 08:23:05	 39m 49s
>> 	Debian 11	ARM64			07:58:02 -- 08:24:45	 26m 43s
>> 	Debian 11	M68K			07:43:15 -- 08:30:56	 47m 41s
>> 	Debian 11	HPPA			13:23:16 -- 21:40:19	497m 03s
>> 	Debian 11	HPPA			07:29:18 -- 18:07:19	638m 01s [qemu-6.1.0-rc3]
>> 	NetBSD 9.2	HPPA			11:22:10 -- 01:25:46	843m 36s
>> 	Debian 11	MIPS32			09:21:49 -- 10:42:41	 80m 52s
>> 	Debian 11	SPARC64			14:45:16 -- 06:19:00	933m 44s
>> 	Debian 11	SPARC64			17:57:58 -- 04:02:42	603m 44s [qemu-6.1.0-rc3]
>> 	Ubuntu 18.04	S390x			18:34:34 -- 19:04:36	 30m 02s
>> 	Ubuntu 20.04	S390x			18:34:35 -- 19:16:54	 42m 19s
>> 	FreeBSD 13	RISC-V64		07:41:14 -- 08:34:00	 52m 46s
>> 	FreeBSD 14	RISC-V64		08:35:27 -- 09:25:35	 50m 08s
>> 	Fedora 34	RISC-V64		07:43:17 -- 08:02:55	 19m 38s
>>
>> >From comparison, here are results on native hardware with local disk
>> (not NFS, unless indicated) [clock speed in GHz is abbreviated to G]:
>>
>> 	ArchLinux	ARM32			09:57:34 -- 10:07:43	 10m 09s
>> 	Debian 11	UltraSparc T2		08:30:54 -- 08:41:18	 10m 24s
>> 	Solaris 10	UltraSparc T2		09:46:31 -- 09:59:32	 13m 01s
>> 	Ubuntu 20.04	Xeon 8253		09:34:52 -- 09:35:36	  0m 44s
>> 	CentOS 7.9	Xeon E6-1600v3		09:39:00 -- 09:39:42	  0m 42s
>> 	CentOS 7.9	Xeon E6-1600v3		10:42:43 -- 10:43:30	  0m 47s [NFS]
>> 	CentOS 7.9	EPYC 7502 2.0G 64C/128T	10:02:01 -- 10:02:27	  0m 26s
>> 	CentOS 7.9	EPYC 7502 2.5G 32C/64T	10:02:00 -- 10:02:25	  0m 25s
>>
>> The tests produce about 62,000 total lines of text output, spread over
>> about 180 files.  They read no input data, and are primarily compute
>> bound in loops with integer, not floating-point, arithmetic, using
>> 32-bit and 64-bit integer types.
>>
>> I have generated machine language for representative code from the
>> hotspot loop using the -S option of gcc and clang, and found that
>> 64-bit arithmetic is expanded inline with 32-bit instructions on
>> ARM32, HPPA, and M68K, none of which have 64-bit arithmetic
>> instructions.  The loop instruction counts are comparable across all
>> of those systems, typically 10 to 20 instructions, compared to 5 or so
>> on those CPUs that have 64-bit arithmetic.
>>
>> The dramatic slowdowns on HPPA and SPARC64 are a big surprise, but the
>> HPPA slowdown matches the poor interactive response.  The SPARC64 VM
>> is much more responsive interactively, and it DOES have 64-bit integer
>> arithmetic.
>>
>> I have not yet done profiling builds of qemu-system-hppa and
>> qemu-system-sparc64, but that remains an option for further
>> investigation to find out what is responsible for the slowness.
>>
>> I can also do profiling builds of parts of my test suite to see
>> whether there are unexpected hotspots on HPPA and SPARC64 that are
>> absent on other CPU types.
>>
>> I have physical SPARC64 hardware running Debian 11 and Solaris 10 on
>> identical boxes, and have done builds of TeX Live on them with no
>> difficulty.  However, the slow speed of QEMU HPPA makes it impractical
>> to try TeX Live builds for Debian 11 HPPA, which is disappointing.
>>
>> Does any list member have any idea of why QEMU emulation of HPPA and
>> SPARC64 is so bad?  Are there Debian kernel parameters that might be
>> tweaked?  Have any of you used Debian on QEMU HPPA and seen similar
>> slowness compared to other CPU types?
>>
>> Notice from my first table above that NetBSD 9.2 on HPPA is also very
>> slow, which tends to point the finger at QEMU as the source of the
>> dismal performance, rather than the VM guest O/S.
>>
>> For the record, here is how QEMU releases downloaded from
>>
>> 	https://www.qemu.org/
>> 	https://download.qemu.org/
>>
>> are built here, taking the most recent QEMU release for the sample:
>>
>> 	tar xf $prefix/src/qemu/qemu-6.1.0-rc3.tar.xz
>> 	cd qemu-6.1.0-rc3
>> 	unsetenv CONFIG_SITE
>> 	mkdir build
>> 	cd build
>> 	env CC=cc CFLAGS=-O2 ../configure --prefix=$prefix && make all -j && make check
>>
>> QEMU builds require prior installation of the ninja-build package
>> available on major GNU/Linux distributions.  On completion, the needed
>> qemu-system-xxx executables are present in the build subdirectory.
>>
>> On Ubuntu 20.04, the QEMU builds are clean, and pass the entire
>> validation suite without any failures.
>>
>> -------------------------------------------------------------------------------
>> - Nelson H. F. Beebe                    Tel: +1 801 581 5254                  -
>> - University of Utah                    FAX: +1 801 581 4148                  -
>> - Department of Mathematics, 110 LCB    Internet e-mail: beebe@math.utah.edu  -
>> - 155 S 1400 E RM 233                       beebe@acm.org  beebe@computer.org -
>> - Salt Lake City, UT 84112-0090, USA    URL: http://www.math.utah.edu/~beebe/ -
>> -------------------------------------------------------------------------------
>>
>


-- 
John David Anglin  dave.anglin@bell.net


Reply to: