[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#814776: marked as done (libc6: Performance regression from 2.19)



Your message dated Thu, 18 Feb 2016 13:03:15 +0200
with message-id <20160218110315.GA8629@asalmela.iki.fi>
and subject line Re: Bug#814776: libc6: Performance regression from 2.19
has caused the Debian Bug report #814776,
regarding libc6: Performance regression from 2.19
to be marked as done.

This means that you claim that the problem has been dealt with.
If this is not the case it is now your responsibility to reopen the
Bug report if necessary, and/or fix the problem forthwith.

(NB: If you are a system administrator and have no idea what this
message is talking about, this may indicate a serious mail system
misconfiguration somewhere. Please contact owner@bugs.debian.org
immediately.)


-- 
814776: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=814776
Debian Bug Tracking System
Contact owner@bugs.debian.org with problems
--- Begin Message ---
Package: libc6
Severity: normal

Dear Maintainer,

something has changed in libc6 between jessie and stretch which has a lead
to a performance regression. I noticed this first with ssh file transfers, but
tested using same version of iperf3 and kernel 4.3.0-1-amd64 on both hosts. 
Both hosts are running as Xen guests on same otherwise completely idle
hardware. Running under Xen is probably needed to reproduce, I can't reproduce
regression on similar bare hardware.

$ iperf3 -c stretch
Connecting to host stretch, port 5201
[  4] local 192.168.3.10 port 60690 connected to 192.168.2.210 port 5201
[ ID] Interval           Transfer     Bandwidth       Retr  Cwnd
[  4]   0.00-1.00   sec  44.4 MBytes   372 Mbits/sec  522   42.4 KBytes       
[  4]   1.00-2.00   sec  16.8 MBytes   141 Mbits/sec  313   7.07 KBytes       
[  4]   2.00-3.00   sec  22.9 MBytes   192 Mbits/sec  524   49.5 KBytes       
[  4]   3.00-4.00   sec  27.4 MBytes   230 Mbits/sec  396   4.24 KBytes       
[  4]   4.00-5.00   sec  53.0 MBytes   444 Mbits/sec  635   33.9 KBytes       
[  4]   5.00-6.00   sec  38.0 MBytes   319 Mbits/sec  513   17.0 KBytes       
[  4]   6.00-7.00   sec  13.7 MBytes   115 Mbits/sec  262   15.6 KBytes       
[  4]   7.00-8.00   sec  36.0 MBytes   302 Mbits/sec  621   52.3 KBytes       
[  4]   8.00-9.00   sec  31.5 MBytes   265 Mbits/sec  545   53.7 KBytes       
[  4]   9.00-10.00  sec  49.7 MBytes   417 Mbits/sec  713   49.5 KBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth       Retr
[  4]   0.00-10.00  sec   334 MBytes   280 Mbits/sec  5044             sender
[  4]   0.00-10.00  sec   333 MBytes   280 Mbits/sec                  receiver

# ========
# captured on: Mon Feb 15 12:13:07 2016
# hostname : stretch
# os release : 4.3.0-1-amd64
# perf version : 4.3.1
# arch : x86_64
# nrcpus online : 2
# nrcpus avail : 2
# cpudesc : Intel(R) Core(TM) i7 CPU 860 @ 2.80GHz
# cpuid : GenuineIntel,6,30,5
# total memory : 1016600 kB
# cmdline : /usr/bin/perf_4.3 record iperf3 -s -1 
# event : name = cpu-clock, , type = 1, size = 112, { sample_period, sample_freq } = 4000, sample_type = IP|TID|TIME|PERIOD, disabled = 1, inherit = 1, mmap = 1, comm = 1, freq = 1, enable_on_exec = 1, task = 1, sample_id_all = 1, exclude_guest = 1, mmap2 = 1, comm_exec = 1
# HEADER_CPU_TOPOLOGY info available, use -I to display
# HEADER_NUMA_TOPOLOGY info available, use -I to display
# pmu mappings: msr = 6, software = 1, tracepoint = 2, breakpoint = 5
# ========
#
#
# Total Lost Samples: 0
#
# Samples: 23  of event 'cpu-clock'
# Event count (approx.): 5750000
#
# Overhead  Command  Shared Object      Symbol                                    
# ........  .......  .................  ..........................................
#
    34.78%  iperf3   [kernel.kallsyms]  [k] xen_hypercall_xen_version             
    13.04%  iperf3   libc-2.21.so       [.] random_r                              
     4.35%  iperf3   [kernel.kallsyms]  [k] __pollwait                            
     4.35%  iperf3   [kernel.kallsyms]  [k] copy_page_to_iter                     
     4.35%  iperf3   [kernel.kallsyms]  [k] dnotify_flush                         
     4.35%  iperf3   [kernel.kallsyms]  [k] fsnotify                              
     4.35%  iperf3   [kernel.kallsyms]  [k] inet_twsk_alloc                       
     4.35%  iperf3   [kernel.kallsyms]  [k] release_sock                          
     4.35%  iperf3   [kernel.kallsyms]  [k] sys_read                              
     4.35%  iperf3   [kernel.kallsyms]  [k] tcp_recvmsg                           
     4.35%  iperf3   [kernel.kallsyms]  [k] xen_hypercall_mmu_update              
     4.35%  iperf3   [kernel.kallsyms]  [k] xennet_alloc_rx_buffers               
     4.35%  iperf3   libc-2.21.so       [.] random                                
     4.35%  iperf3   libc-2.21.so       [.] read                                  


#
# (For a higher level overview, try: perf report --sort comm,dso)
#


iperf Done.
$ iperf3 -c jessie 
Connecting to host jessie, port 5201
[  4] local 192.168.3.10 port 41450 connected to 192.168.2.193 port 5201
[ ID] Interval           Transfer     Bandwidth       Retr  Cwnd
[  4]   0.00-1.00   sec  56.0 MBytes   470 Mbits/sec  324   11.3 KBytes       
[  4]   1.00-2.00   sec  48.3 MBytes   405 Mbits/sec  490   74.9 KBytes       
[  4]   2.00-3.00   sec  60.7 MBytes   509 Mbits/sec  510   93.3 KBytes       
[  4]   3.00-4.00   sec  30.4 MBytes   255 Mbits/sec  250   93.3 KBytes       
[  4]   4.00-5.00   sec  77.0 MBytes   646 Mbits/sec  351   63.6 KBytes       
[  4]   5.00-6.00   sec  54.1 MBytes   454 Mbits/sec  295   21.2 KBytes       
[  4]   6.00-7.00   sec  36.9 MBytes   309 Mbits/sec  397   70.7 KBytes       
[  4]   7.00-8.00   sec  44.6 MBytes   374 Mbits/sec  308   5.66 KBytes       
[  4]   8.00-9.00   sec  74.7 MBytes   626 Mbits/sec  551   74.9 KBytes       
[  4]   9.00-10.00  sec  62.5 MBytes   525 Mbits/sec  361   69.3 KBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth       Retr
[  4]   0.00-10.00  sec   545 MBytes   457 Mbits/sec  3837             sender
[  4]   0.00-10.00  sec   545 MBytes   457 Mbits/sec                  receiver

iperf Done.

# ========
# captured on: Mon Feb 15 12:13:42 2016
# hostname : jessie
# os release : 4.3.0-1-amd64
# perf version : 4.3.1
# arch : x86_64
# nrcpus online : 2
# nrcpus avail : 2
# cpudesc : Intel(R) Core(TM) i7 CPU 860 @ 2.80GHz
# cpuid : GenuineIntel,6,30,5
# total memory : 1016600 kB
# cmdline : /usr/bin/perf_4.3 record iperf3 -s -1 
# event : name = cpu-clock, , type = 1, size = 112, { sample_period, sample_freq } = 4000, sample_type = IP|TID|TIME|PERIOD, disabled = 1, inherit = 1, mmap = 1, comm = 1, freq = 1, enable_on_exec = 1, task = 1, sample_id_all = 1, exclude_guest = 1, mmap2 = 1, comm_exec = 1
# HEADER_CPU_TOPOLOGY info available, use -I to display
# HEADER_NUMA_TOPOLOGY info available, use -I to display
# pmu mappings: msr = 6, software = 1, tracepoint = 2, breakpoint = 5
# ========
#
#
# Total Lost Samples: 0
#
# Samples: 19  of event 'cpu-clock'
# Event count (approx.): 4750000
#
# Overhead  Command  Shared Object      Symbol                       
# ........  .......  .................  .............................
#
    15.79%  iperf3   libc-2.19.so       [.] random                   
    15.79%  iperf3   libc-2.19.so       [.] random_r                 
    10.53%  iperf3   [kernel.kallsyms]  [k] unmap_single_vma         
    10.53%  iperf3   [kernel.kallsyms]  [k] xen_hypercall_mmu_update 
    10.53%  iperf3   [kernel.kallsyms]  [k] xen_hypercall_xen_version
     5.26%  iperf3   [kernel.kallsyms]  [k] copy_user_generic_string 
     5.26%  iperf3   [kernel.kallsyms]  [k] fsnotify                 
     5.26%  iperf3   [kernel.kallsyms]  [k] get_pfnblock_flags_mask  
     5.26%  iperf3   [kernel.kallsyms]  [k] kmem_cache_free          
     5.26%  iperf3   [kernel.kallsyms]  [k] sys_read                 
     5.26%  iperf3   [kernel.kallsyms]  [k] sys_select               
     5.26%  iperf3   [kernel.kallsyms]  [k] tcp_poll                 


#
# (For a higher level overview, try: perf report --sort comm,dso)
#


*** End of the template - remove these template lines ***


-- System Information:
Debian Release: stretch/sid
  APT prefers testing
  APT policy: (900, 'testing'), (600, 'stable'), (550, 'unstable'), (101, 'experimental')
Architecture: amd64 (x86_64)
Foreign Architectures: i386

Kernel: Linux 4.3.0-1-amd64 (SMP w/4 CPU cores)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash
Init: systemd (via /run/systemd/system)

--- End Message ---
--- Begin Message ---
On Mon, Feb 15, 2016 at 03:33:09PM +0100, Aurelien Jarno wrote:
> control: tag -1 + moreinfo
> 
> On 2016-02-15 12:56, Antti Salmela wrote:
> > Package: libc6
> > Severity: normal
> > 
> > Dear Maintainer,
> > 
> > something has changed in libc6 between jessie and stretch which has a lead
> > to a performance regression. I noticed this first with ssh file transfers, but
> > tested using same version of iperf3 and kernel 4.3.0-1-amd64 on both hosts.
> > Both hosts are running as Xen guests on same otherwise completely idle
> > hardware. Running under Xen is probably needed to reproduce, I can't reproduce
> > regression on similar bare hardware.
> 
> What makes you think this is a libc6 issue? Could you please at least
> try to do the same test with a jessie machine with only libc6 and
> related packages updated to stretch.

Turns out it wasn't. Firewall / router with bonded 2 x 1 gigabit ethernet
caused problems with just this stretch host. Other link dropped packets
when used heavily.

Every other host worked fine so I guess I had just really bad luck with
how switch / firewall chose link for traffic to this host.

-- 
Antti Salmela

--- End Message ---

Reply to: