Re: [debian-hppa] poor performance of Debian 11 HPPA with qemu-system-hppa
Some hints are here:
https://parisc.wiki.kernel.org/index.php/Qemu
Dave
On 2021-08-14 2:42 p.m., John David Anglin wrote:
> I was wrong. Pasta is configured for 8 cpus.
>
> Dave
>
> On 2021-08-14 1:31 p.m., John David Anglin wrote:
>> Hi Nelson,
>>
>> Helge Deller is the expert on this and you likely will have to wait until he returns from vacation
>> for an answer. I think the pasta buildd running hppa emulation is configured for one cpu although
>> I could be wrong. Performance is a little slower than a real 800 MHz PA8800 machine.
>>
>> Some profiling likely would be helpful.
>>
>> Dave
>>
>> On 2021-08-14 10:35 a.m., Nelson H. F. Beebe wrote:
>>> In a previous message to the debian-hppa list today, I described how I
>>> finally got a virtual machine successfully created for running Debian
>>> 11 on HPPA (aka PA-RISC).
>>>
>>> On the same host
>>>
>>> Dell Precision 7920 (1 16-core CPU, 32 hyperthreads,
>>> 2200MHz Intel Xeon Platinum 8253,
>>> 384GB DDR-4 RAM);
>>> Ubuntu 20.04.02 LTS (Focal Fossa)
>>>
>>> I have VMs running with QEMU emulation for Alpha, ARM64, M68K, MIPS32,
>>> MIPS64, RISC-V64, S390x, and SPARC64, and most of them have quite
>>> reasonable interactive performance, making it possible to use the
>>> emacs editor in terminal windows and X11 windows without any serious
>>> response problems.
>>>
>>> However, for the new Debian 11 HPPA VM, interactive performance is a
>>> huge issue: shell typein sometimes gets immediate character echo, but
>>> frequently gets delays of 10 to 30 seconds for each input character.
>>> That makes it extremely hard for a fast typist to type commands and
>>> text: one is never sure whether input keys have been dropped.
>>>
>>> I develop mathematical software, and a large package that I'm writing
>>> for multiple precision arithmetic provides a testbed for evaluating VM
>>> performance. Most of the QEMU CPU types support multiple processors,
>>> but M68K and SPARC64 sun4u only permit one CPU. For HPPA, I have 4 CPUs
>>> and 3GB DRAM; the latter is a hard limit imposed in QEMU source code.
>>>
>>> Here is a table of running the equivalent of
>>>
>>> date; make all check ; date
>>>
>>> on these systems, using QEMU-6.0.0, unless noted. Both compilations
>>> and test programs are run in parallel, by internal "make -j" commands.
>>>
>>> make timing (wall clock)
>>>
>>> Debian 11 Alpha 07:43:16 -- 08:23:05 39m 49s
>>> Debian 11 ARM64 07:58:02 -- 08:24:45 26m 43s
>>> Debian 11 M68K 07:43:15 -- 08:30:56 47m 41s
>>> Debian 11 HPPA 13:23:16 -- 21:40:19 497m 03s
>>> Debian 11 HPPA 07:29:18 -- 18:07:19 638m 01s [qemu-6.1.0-rc3]
>>> NetBSD 9.2 HPPA 11:22:10 -- 01:25:46 843m 36s
>>> Debian 11 MIPS32 09:21:49 -- 10:42:41 80m 52s
>>> Debian 11 SPARC64 14:45:16 -- 06:19:00 933m 44s
>>> Debian 11 SPARC64 17:57:58 -- 04:02:42 603m 44s [qemu-6.1.0-rc3]
>>> Ubuntu 18.04 S390x 18:34:34 -- 19:04:36 30m 02s
>>> Ubuntu 20.04 S390x 18:34:35 -- 19:16:54 42m 19s
>>> FreeBSD 13 RISC-V64 07:41:14 -- 08:34:00 52m 46s
>>> FreeBSD 14 RISC-V64 08:35:27 -- 09:25:35 50m 08s
>>> Fedora 34 RISC-V64 07:43:17 -- 08:02:55 19m 38s
>>>
>>> >From comparison, here are results on native hardware with local disk
>>> (not NFS, unless indicated) [clock speed in GHz is abbreviated to G]:
>>>
>>> ArchLinux ARM32 09:57:34 -- 10:07:43 10m 09s
>>> Debian 11 UltraSparc T2 08:30:54 -- 08:41:18 10m 24s
>>> Solaris 10 UltraSparc T2 09:46:31 -- 09:59:32 13m 01s
>>> Ubuntu 20.04 Xeon 8253 09:34:52 -- 09:35:36 0m 44s
>>> CentOS 7.9 Xeon E6-1600v3 09:39:00 -- 09:39:42 0m 42s
>>> CentOS 7.9 Xeon E6-1600v3 10:42:43 -- 10:43:30 0m 47s [NFS]
>>> CentOS 7.9 EPYC 7502 2.0G 64C/128T 10:02:01 -- 10:02:27 0m 26s
>>> CentOS 7.9 EPYC 7502 2.5G 32C/64T 10:02:00 -- 10:02:25 0m 25s
>>>
>>> The tests produce about 62,000 total lines of text output, spread over
>>> about 180 files. They read no input data, and are primarily compute
>>> bound in loops with integer, not floating-point, arithmetic, using
>>> 32-bit and 64-bit integer types.
>>>
>>> I have generated machine language for representative code from the
>>> hotspot loop using the -S option of gcc and clang, and found that
>>> 64-bit arithmetic is expanded inline with 32-bit instructions on
>>> ARM32, HPPA, and M68K, none of which have 64-bit arithmetic
>>> instructions. The loop instruction counts are comparable across all
>>> of those systems, typically 10 to 20 instructions, compared to 5 or so
>>> on those CPUs that have 64-bit arithmetic.
>>>
>>> The dramatic slowdowns on HPPA and SPARC64 are a big surprise, but the
>>> HPPA slowdown matches the poor interactive response. The SPARC64 VM
>>> is much more responsive interactively, and it DOES have 64-bit integer
>>> arithmetic.
>>>
>>> I have not yet done profiling builds of qemu-system-hppa and
>>> qemu-system-sparc64, but that remains an option for further
>>> investigation to find out what is responsible for the slowness.
>>>
>>> I can also do profiling builds of parts of my test suite to see
>>> whether there are unexpected hotspots on HPPA and SPARC64 that are
>>> absent on other CPU types.
>>>
>>> I have physical SPARC64 hardware running Debian 11 and Solaris 10 on
>>> identical boxes, and have done builds of TeX Live on them with no
>>> difficulty. However, the slow speed of QEMU HPPA makes it impractical
>>> to try TeX Live builds for Debian 11 HPPA, which is disappointing.
>>>
>>> Does any list member have any idea of why QEMU emulation of HPPA and
>>> SPARC64 is so bad? Are there Debian kernel parameters that might be
>>> tweaked? Have any of you used Debian on QEMU HPPA and seen similar
>>> slowness compared to other CPU types?
>>>
>>> Notice from my first table above that NetBSD 9.2 on HPPA is also very
>>> slow, which tends to point the finger at QEMU as the source of the
>>> dismal performance, rather than the VM guest O/S.
>>>
>>> For the record, here is how QEMU releases downloaded from
>>>
>>> https://www.qemu.org/
>>> https://download.qemu.org/
>>>
>>> are built here, taking the most recent QEMU release for the sample:
>>>
>>> tar xf $prefix/src/qemu/qemu-6.1.0-rc3.tar.xz
>>> cd qemu-6.1.0-rc3
>>> unsetenv CONFIG_SITE
>>> mkdir build
>>> cd build
>>> env CC=cc CFLAGS=-O2 ../configure --prefix=$prefix && make all -j && make check
>>>
>>> QEMU builds require prior installation of the ninja-build package
>>> available on major GNU/Linux distributions. On completion, the needed
>>> qemu-system-xxx executables are present in the build subdirectory.
>>>
>>> On Ubuntu 20.04, the QEMU builds are clean, and pass the entire
>>> validation suite without any failures.
>>>
>>> -------------------------------------------------------------------------------
>>> - Nelson H. F. Beebe Tel: +1 801 581 5254 -
>>> - University of Utah FAX: +1 801 581 4148 -
>>> - Department of Mathematics, 110 LCB Internet e-mail: beebe@math.utah.edu -
>>> - 155 S 1400 E RM 233 beebe@acm.org beebe@computer.org -
>>> - Salt Lake City, UT 84112-0090, USA URL: http://www.math.utah.edu/~beebe/ -
>>> -------------------------------------------------------------------------------
>>>
>
--
John David Anglin dave.anglin@bell.net
Reply to: