Bug#814776: libc6: Performance regression from 2.19
Package: libc6
Severity: normal
Dear Maintainer,
something has changed in libc6 between jessie and stretch which has a lead
to a performance regression. I noticed this first with ssh file transfers, but
tested using same version of iperf3 and kernel 4.3.0-1-amd64 on both hosts.
Both hosts are running as Xen guests on same otherwise completely idle
hardware. Running under Xen is probably needed to reproduce, I can't reproduce
regression on similar bare hardware.
$ iperf3 -c stretch
Connecting to host stretch, port 5201
[ 4] local 192.168.3.10 port 60690 connected to 192.168.2.210 port 5201
[ ID] Interval Transfer Bandwidth Retr Cwnd
[ 4] 0.00-1.00 sec 44.4 MBytes 372 Mbits/sec 522 42.4 KBytes
[ 4] 1.00-2.00 sec 16.8 MBytes 141 Mbits/sec 313 7.07 KBytes
[ 4] 2.00-3.00 sec 22.9 MBytes 192 Mbits/sec 524 49.5 KBytes
[ 4] 3.00-4.00 sec 27.4 MBytes 230 Mbits/sec 396 4.24 KBytes
[ 4] 4.00-5.00 sec 53.0 MBytes 444 Mbits/sec 635 33.9 KBytes
[ 4] 5.00-6.00 sec 38.0 MBytes 319 Mbits/sec 513 17.0 KBytes
[ 4] 6.00-7.00 sec 13.7 MBytes 115 Mbits/sec 262 15.6 KBytes
[ 4] 7.00-8.00 sec 36.0 MBytes 302 Mbits/sec 621 52.3 KBytes
[ 4] 8.00-9.00 sec 31.5 MBytes 265 Mbits/sec 545 53.7 KBytes
[ 4] 9.00-10.00 sec 49.7 MBytes 417 Mbits/sec 713 49.5 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bandwidth Retr
[ 4] 0.00-10.00 sec 334 MBytes 280 Mbits/sec 5044 sender
[ 4] 0.00-10.00 sec 333 MBytes 280 Mbits/sec receiver
# ========
# captured on: Mon Feb 15 12:13:07 2016
# hostname : stretch
# os release : 4.3.0-1-amd64
# perf version : 4.3.1
# arch : x86_64
# nrcpus online : 2
# nrcpus avail : 2
# cpudesc : Intel(R) Core(TM) i7 CPU 860 @ 2.80GHz
# cpuid : GenuineIntel,6,30,5
# total memory : 1016600 kB
# cmdline : /usr/bin/perf_4.3 record iperf3 -s -1
# event : name = cpu-clock, , type = 1, size = 112, { sample_period, sample_freq } = 4000, sample_type = IP|TID|TIME|PERIOD, disabled = 1, inherit = 1, mmap = 1, comm = 1, freq = 1, enable_on_exec = 1, task = 1, sample_id_all = 1, exclude_guest = 1, mmap2 = 1, comm_exec = 1
# HEADER_CPU_TOPOLOGY info available, use -I to display
# HEADER_NUMA_TOPOLOGY info available, use -I to display
# pmu mappings: msr = 6, software = 1, tracepoint = 2, breakpoint = 5
# ========
#
#
# Total Lost Samples: 0
#
# Samples: 23 of event 'cpu-clock'
# Event count (approx.): 5750000
#
# Overhead Command Shared Object Symbol
# ........ ....... ................. ..........................................
#
34.78% iperf3 [kernel.kallsyms] [k] xen_hypercall_xen_version
13.04% iperf3 libc-2.21.so [.] random_r
4.35% iperf3 [kernel.kallsyms] [k] __pollwait
4.35% iperf3 [kernel.kallsyms] [k] copy_page_to_iter
4.35% iperf3 [kernel.kallsyms] [k] dnotify_flush
4.35% iperf3 [kernel.kallsyms] [k] fsnotify
4.35% iperf3 [kernel.kallsyms] [k] inet_twsk_alloc
4.35% iperf3 [kernel.kallsyms] [k] release_sock
4.35% iperf3 [kernel.kallsyms] [k] sys_read
4.35% iperf3 [kernel.kallsyms] [k] tcp_recvmsg
4.35% iperf3 [kernel.kallsyms] [k] xen_hypercall_mmu_update
4.35% iperf3 [kernel.kallsyms] [k] xennet_alloc_rx_buffers
4.35% iperf3 libc-2.21.so [.] random
4.35% iperf3 libc-2.21.so [.] read
#
# (For a higher level overview, try: perf report --sort comm,dso)
#
iperf Done.
$ iperf3 -c jessie
Connecting to host jessie, port 5201
[ 4] local 192.168.3.10 port 41450 connected to 192.168.2.193 port 5201
[ ID] Interval Transfer Bandwidth Retr Cwnd
[ 4] 0.00-1.00 sec 56.0 MBytes 470 Mbits/sec 324 11.3 KBytes
[ 4] 1.00-2.00 sec 48.3 MBytes 405 Mbits/sec 490 74.9 KBytes
[ 4] 2.00-3.00 sec 60.7 MBytes 509 Mbits/sec 510 93.3 KBytes
[ 4] 3.00-4.00 sec 30.4 MBytes 255 Mbits/sec 250 93.3 KBytes
[ 4] 4.00-5.00 sec 77.0 MBytes 646 Mbits/sec 351 63.6 KBytes
[ 4] 5.00-6.00 sec 54.1 MBytes 454 Mbits/sec 295 21.2 KBytes
[ 4] 6.00-7.00 sec 36.9 MBytes 309 Mbits/sec 397 70.7 KBytes
[ 4] 7.00-8.00 sec 44.6 MBytes 374 Mbits/sec 308 5.66 KBytes
[ 4] 8.00-9.00 sec 74.7 MBytes 626 Mbits/sec 551 74.9 KBytes
[ 4] 9.00-10.00 sec 62.5 MBytes 525 Mbits/sec 361 69.3 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bandwidth Retr
[ 4] 0.00-10.00 sec 545 MBytes 457 Mbits/sec 3837 sender
[ 4] 0.00-10.00 sec 545 MBytes 457 Mbits/sec receiver
iperf Done.
# ========
# captured on: Mon Feb 15 12:13:42 2016
# hostname : jessie
# os release : 4.3.0-1-amd64
# perf version : 4.3.1
# arch : x86_64
# nrcpus online : 2
# nrcpus avail : 2
# cpudesc : Intel(R) Core(TM) i7 CPU 860 @ 2.80GHz
# cpuid : GenuineIntel,6,30,5
# total memory : 1016600 kB
# cmdline : /usr/bin/perf_4.3 record iperf3 -s -1
# event : name = cpu-clock, , type = 1, size = 112, { sample_period, sample_freq } = 4000, sample_type = IP|TID|TIME|PERIOD, disabled = 1, inherit = 1, mmap = 1, comm = 1, freq = 1, enable_on_exec = 1, task = 1, sample_id_all = 1, exclude_guest = 1, mmap2 = 1, comm_exec = 1
# HEADER_CPU_TOPOLOGY info available, use -I to display
# HEADER_NUMA_TOPOLOGY info available, use -I to display
# pmu mappings: msr = 6, software = 1, tracepoint = 2, breakpoint = 5
# ========
#
#
# Total Lost Samples: 0
#
# Samples: 19 of event 'cpu-clock'
# Event count (approx.): 4750000
#
# Overhead Command Shared Object Symbol
# ........ ....... ................. .............................
#
15.79% iperf3 libc-2.19.so [.] random
15.79% iperf3 libc-2.19.so [.] random_r
10.53% iperf3 [kernel.kallsyms] [k] unmap_single_vma
10.53% iperf3 [kernel.kallsyms] [k] xen_hypercall_mmu_update
10.53% iperf3 [kernel.kallsyms] [k] xen_hypercall_xen_version
5.26% iperf3 [kernel.kallsyms] [k] copy_user_generic_string
5.26% iperf3 [kernel.kallsyms] [k] fsnotify
5.26% iperf3 [kernel.kallsyms] [k] get_pfnblock_flags_mask
5.26% iperf3 [kernel.kallsyms] [k] kmem_cache_free
5.26% iperf3 [kernel.kallsyms] [k] sys_read
5.26% iperf3 [kernel.kallsyms] [k] sys_select
5.26% iperf3 [kernel.kallsyms] [k] tcp_poll
#
# (For a higher level overview, try: perf report --sort comm,dso)
#
*** End of the template - remove these template lines ***
-- System Information:
Debian Release: stretch/sid
APT prefers testing
APT policy: (900, 'testing'), (600, 'stable'), (550, 'unstable'), (101, 'experimental')
Architecture: amd64 (x86_64)
Foreign Architectures: i386
Kernel: Linux 4.3.0-1-amd64 (SMP w/4 CPU cores)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash
Init: systemd (via /run/systemd/system)
Reply to: