[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#814776: libc6: Performance regression from 2.19



Package: libc6
Severity: normal

Dear Maintainer,

something has changed in libc6 between jessie and stretch which has a lead
to a performance regression. I noticed this first with ssh file transfers, but
tested using same version of iperf3 and kernel 4.3.0-1-amd64 on both hosts. 
Both hosts are running as Xen guests on same otherwise completely idle
hardware. Running under Xen is probably needed to reproduce, I can't reproduce
regression on similar bare hardware.

$ iperf3 -c stretch
Connecting to host stretch, port 5201
[  4] local 192.168.3.10 port 60690 connected to 192.168.2.210 port 5201
[ ID] Interval           Transfer     Bandwidth       Retr  Cwnd
[  4]   0.00-1.00   sec  44.4 MBytes   372 Mbits/sec  522   42.4 KBytes       
[  4]   1.00-2.00   sec  16.8 MBytes   141 Mbits/sec  313   7.07 KBytes       
[  4]   2.00-3.00   sec  22.9 MBytes   192 Mbits/sec  524   49.5 KBytes       
[  4]   3.00-4.00   sec  27.4 MBytes   230 Mbits/sec  396   4.24 KBytes       
[  4]   4.00-5.00   sec  53.0 MBytes   444 Mbits/sec  635   33.9 KBytes       
[  4]   5.00-6.00   sec  38.0 MBytes   319 Mbits/sec  513   17.0 KBytes       
[  4]   6.00-7.00   sec  13.7 MBytes   115 Mbits/sec  262   15.6 KBytes       
[  4]   7.00-8.00   sec  36.0 MBytes   302 Mbits/sec  621   52.3 KBytes       
[  4]   8.00-9.00   sec  31.5 MBytes   265 Mbits/sec  545   53.7 KBytes       
[  4]   9.00-10.00  sec  49.7 MBytes   417 Mbits/sec  713   49.5 KBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth       Retr
[  4]   0.00-10.00  sec   334 MBytes   280 Mbits/sec  5044             sender
[  4]   0.00-10.00  sec   333 MBytes   280 Mbits/sec                  receiver

# ========
# captured on: Mon Feb 15 12:13:07 2016
# hostname : stretch
# os release : 4.3.0-1-amd64
# perf version : 4.3.1
# arch : x86_64
# nrcpus online : 2
# nrcpus avail : 2
# cpudesc : Intel(R) Core(TM) i7 CPU 860 @ 2.80GHz
# cpuid : GenuineIntel,6,30,5
# total memory : 1016600 kB
# cmdline : /usr/bin/perf_4.3 record iperf3 -s -1 
# event : name = cpu-clock, , type = 1, size = 112, { sample_period, sample_freq } = 4000, sample_type = IP|TID|TIME|PERIOD, disabled = 1, inherit = 1, mmap = 1, comm = 1, freq = 1, enable_on_exec = 1, task = 1, sample_id_all = 1, exclude_guest = 1, mmap2 = 1, comm_exec = 1
# HEADER_CPU_TOPOLOGY info available, use -I to display
# HEADER_NUMA_TOPOLOGY info available, use -I to display
# pmu mappings: msr = 6, software = 1, tracepoint = 2, breakpoint = 5
# ========
#
#
# Total Lost Samples: 0
#
# Samples: 23  of event 'cpu-clock'
# Event count (approx.): 5750000
#
# Overhead  Command  Shared Object      Symbol                                    
# ........  .......  .................  ..........................................
#
    34.78%  iperf3   [kernel.kallsyms]  [k] xen_hypercall_xen_version             
    13.04%  iperf3   libc-2.21.so       [.] random_r                              
     4.35%  iperf3   [kernel.kallsyms]  [k] __pollwait                            
     4.35%  iperf3   [kernel.kallsyms]  [k] copy_page_to_iter                     
     4.35%  iperf3   [kernel.kallsyms]  [k] dnotify_flush                         
     4.35%  iperf3   [kernel.kallsyms]  [k] fsnotify                              
     4.35%  iperf3   [kernel.kallsyms]  [k] inet_twsk_alloc                       
     4.35%  iperf3   [kernel.kallsyms]  [k] release_sock                          
     4.35%  iperf3   [kernel.kallsyms]  [k] sys_read                              
     4.35%  iperf3   [kernel.kallsyms]  [k] tcp_recvmsg                           
     4.35%  iperf3   [kernel.kallsyms]  [k] xen_hypercall_mmu_update              
     4.35%  iperf3   [kernel.kallsyms]  [k] xennet_alloc_rx_buffers               
     4.35%  iperf3   libc-2.21.so       [.] random                                
     4.35%  iperf3   libc-2.21.so       [.] read                                  


#
# (For a higher level overview, try: perf report --sort comm,dso)
#


iperf Done.
$ iperf3 -c jessie 
Connecting to host jessie, port 5201
[  4] local 192.168.3.10 port 41450 connected to 192.168.2.193 port 5201
[ ID] Interval           Transfer     Bandwidth       Retr  Cwnd
[  4]   0.00-1.00   sec  56.0 MBytes   470 Mbits/sec  324   11.3 KBytes       
[  4]   1.00-2.00   sec  48.3 MBytes   405 Mbits/sec  490   74.9 KBytes       
[  4]   2.00-3.00   sec  60.7 MBytes   509 Mbits/sec  510   93.3 KBytes       
[  4]   3.00-4.00   sec  30.4 MBytes   255 Mbits/sec  250   93.3 KBytes       
[  4]   4.00-5.00   sec  77.0 MBytes   646 Mbits/sec  351   63.6 KBytes       
[  4]   5.00-6.00   sec  54.1 MBytes   454 Mbits/sec  295   21.2 KBytes       
[  4]   6.00-7.00   sec  36.9 MBytes   309 Mbits/sec  397   70.7 KBytes       
[  4]   7.00-8.00   sec  44.6 MBytes   374 Mbits/sec  308   5.66 KBytes       
[  4]   8.00-9.00   sec  74.7 MBytes   626 Mbits/sec  551   74.9 KBytes       
[  4]   9.00-10.00  sec  62.5 MBytes   525 Mbits/sec  361   69.3 KBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth       Retr
[  4]   0.00-10.00  sec   545 MBytes   457 Mbits/sec  3837             sender
[  4]   0.00-10.00  sec   545 MBytes   457 Mbits/sec                  receiver

iperf Done.

# ========
# captured on: Mon Feb 15 12:13:42 2016
# hostname : jessie
# os release : 4.3.0-1-amd64
# perf version : 4.3.1
# arch : x86_64
# nrcpus online : 2
# nrcpus avail : 2
# cpudesc : Intel(R) Core(TM) i7 CPU 860 @ 2.80GHz
# cpuid : GenuineIntel,6,30,5
# total memory : 1016600 kB
# cmdline : /usr/bin/perf_4.3 record iperf3 -s -1 
# event : name = cpu-clock, , type = 1, size = 112, { sample_period, sample_freq } = 4000, sample_type = IP|TID|TIME|PERIOD, disabled = 1, inherit = 1, mmap = 1, comm = 1, freq = 1, enable_on_exec = 1, task = 1, sample_id_all = 1, exclude_guest = 1, mmap2 = 1, comm_exec = 1
# HEADER_CPU_TOPOLOGY info available, use -I to display
# HEADER_NUMA_TOPOLOGY info available, use -I to display
# pmu mappings: msr = 6, software = 1, tracepoint = 2, breakpoint = 5
# ========
#
#
# Total Lost Samples: 0
#
# Samples: 19  of event 'cpu-clock'
# Event count (approx.): 4750000
#
# Overhead  Command  Shared Object      Symbol                       
# ........  .......  .................  .............................
#
    15.79%  iperf3   libc-2.19.so       [.] random                   
    15.79%  iperf3   libc-2.19.so       [.] random_r                 
    10.53%  iperf3   [kernel.kallsyms]  [k] unmap_single_vma         
    10.53%  iperf3   [kernel.kallsyms]  [k] xen_hypercall_mmu_update 
    10.53%  iperf3   [kernel.kallsyms]  [k] xen_hypercall_xen_version
     5.26%  iperf3   [kernel.kallsyms]  [k] copy_user_generic_string 
     5.26%  iperf3   [kernel.kallsyms]  [k] fsnotify                 
     5.26%  iperf3   [kernel.kallsyms]  [k] get_pfnblock_flags_mask  
     5.26%  iperf3   [kernel.kallsyms]  [k] kmem_cache_free          
     5.26%  iperf3   [kernel.kallsyms]  [k] sys_read                 
     5.26%  iperf3   [kernel.kallsyms]  [k] sys_select               
     5.26%  iperf3   [kernel.kallsyms]  [k] tcp_poll                 


#
# (For a higher level overview, try: perf report --sort comm,dso)
#


*** End of the template - remove these template lines ***


-- System Information:
Debian Release: stretch/sid
  APT prefers testing
  APT policy: (900, 'testing'), (600, 'stable'), (550, 'unstable'), (101, 'experimental')
Architecture: amd64 (x86_64)
Foreign Architectures: i386

Kernel: Linux 4.3.0-1-amd64 (SMP w/4 CPU cores)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash
Init: systemd (via /run/systemd/system)


Reply to: