[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#956418: marked as done (src:glibc: Please provide optimized builds for ARMv8.1)



Your message dated Mon, 18 May 2020 22:33:45 +0000
with message-id <E1jaoKP-0008Rl-QH@fasolo.debian.org>
and subject line Bug#956418: fixed in glibc 2.31-0experimental2
has caused the Debian Bug report #956418,
regarding src:glibc: Please provide optimized builds for ARMv8.1
to be marked as done.

This means that you claim that the problem has been dealt with.
If this is not the case it is now your responsibility to reopen the
Bug report if necessary, and/or fix the problem forthwith.

(NB: If you are a system administrator and have no idea what this
message is talking about, this may indicate a serious mail system
misconfiguration somewhere. Please contact owner@bugs.debian.org
immediately.)


-- 
956418: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=956418
Debian Bug Tracking System
Contact owner@bugs.debian.org with problems
--- Begin Message ---
Package: src:glibc
Version: 2.30-4
Severity: wishlist
X-Debbugs-CC: debian-arm@lists.debian.org

The ARMv8.1 spec, as implemented by the ARM Neoverse N1 processor,
introduces a set of instructions [1] that result in significant performance
improvements for multithreaded applications.  Sample code demonstrating the
performance improvements is attached.  When run on a 16-core Neoverse N1
host with glibc 2.30-4, runtimes vary significantly, ranging from lows
around 250ms to highs around 15 seconds.  When linked against glibc rebuilt
with support for these instructions, runtimes are consistently <50ms.
Significant performance impact has also been observed in less contrived
cases (MariaDB and Postgres), but I don't have a repro to share.

Gcc provides two ways to enable support for these instructions at build
time.  The simplest, and least disruptive, is to enable -moutline-atomics
globally in the arm64 glibc build.  As described at [2], this option enables
runtime checks for the availability of the atomic instructions.  If found,
they are used, otherwise ARMv8.0 compatible code is used.  The drawback of
this option is that the check happens at runtime, thus introducing some
overhead on all arm64 installations.

The second option is to provide libraries built with explicit support for
the ARM v8.1a spec via the -march=armv8.1-a flag.  This option is also
described at [2].  This build would be incompatible with earlier versions of
the spec, so it would need to be provided in a location where the linker
will automatically discover it if it is usable (e.g.
/lib/aarch64-linux-gnu/atomics/).  This does not incur any runtime overhead,
but obviously involves an additional libc build, and the corresponding
complixity and disk space utilization.  I'm not sure if this is an option
that the glibc maintainers are interested in pursuing.

I've tested both options and found them to be acceptable on v8.1a (Neoverse
N1) and v8a (Cortex A72) CPUs.  I can provide bulk test run data of the
various different configuration permutations if you'd like to see additional
data.

I can provide patches or merge requests implementing either option, at least
for a starting point, if you'd like to see them.

Thanks!
noah

1. https://static.docs.arm.com/ddi0557/a/DDI0557A_b_armv8_1_supplement.pdf
   Section B1
2. https://gcc.gnu.org/onlinedocs/gcc/AArch64-Options.html
/*
 * Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
 *
 * Licensed under the Apache License, Version 2.0 (the "License"). You may
 * not use this file except in compliance with the License. A copy of the
 * License is located at
 *
 *      http://aws.amazon.com/apache2.0/
 *
 * or in the "license" file accompanying this file. This file is distributed
 * on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either
 * express or implied. See the License for the specific language governing
 * permissions and limitations under the License.
*/

/* Build with:
 * gcc -O2 -o a.out a.c -lpthread -DITER=1000 -DTHREADS=64
*/

#include <pthread.h>
#include <stdlib.h>
#include <stdio.h>
#include <inttypes.h>

#ifndef ITER
# define ITER 1000
#endif
#ifndef THREADS
# define THREADS 3
#endif

#if THREADS < 1
# error "THREADS is supposed to be at least 1"
#endif

static pthread_mutex_t lock = PTHREAD_MUTEX_INITIALIZER;
static int shared_ptr = 0;

typedef struct stats_s {
  uint64_t min, max;
  int times;
  uint64_t total;
  uint64_t flips;
} stats_t;

stats_t stats[THREADS + 1];
pthread_t threads[THREADS];

#ifdef __aarch64__
static uint64_t cpu_shift() {
  uint64_t shift = 0;
  __asm__ __volatile__ ("mrs %0,cntfrq_el0; clz %w0, %w0":"=&r"(shift));
  return shift;
}
#endif

static uint64_t gettime() {
#ifdef __aarch64__
  uint64_t ret = 0;
  __asm__ __volatile__ ("isb; mrs %0,cntvct_el0":"=r"(ret));
  return ret << cpu_shift();

#elif defined __x86_64__
  uint64_t a, d;
  __asm__ __volatile__ ("rdtsc" : "=a" (a), "=d" (d));
  return ((uint64_t)a + ((uint64_t)d << 32));
#endif

  return 0;
}

static void init_stats() {
  int i;
  for (i = 0; i <= THREADS; i++) {
    stats_t *s = &stats[i];
    s->min = 1000000;
    s->max = 0;
    s->times = 0;
    s->total = 0;
    s->flips = 0;
  }
}

static void print_stat(int i) {
  stats_t *s = &stats[i];
  float average = (float) s->total / s->times;
  if (i == THREADS)
    fprintf(stdout, "server: min=%ld, max=%ld, average=%f, mutexes_locked=%d, flips=%ld\n", s->min, s->max, average, s->times, s->flips);
  else
    fprintf(stdout, "thread %d: min=%ld, max=%ld, average=%f, mutexes_locked=%d, flips=%ld\n", i, s->min, s->max, average, s->times, s->flips);
}

static void print_stats() {
  int i;
  for (i = 0; i <= THREADS; i++)
    print_stat(i);
}

static void update_stats(stats_t *s, uint64_t time) {
  ++s->times;
  if (time < s->min)
    s->min = time;
  if (time > s->max)
    s->max = time;
  s->total += time;
}

static void fun(int check, int set, stats_t *stat) {
  int loop = 1;
  while (loop) {
    uint64_t start = gettime();
    pthread_mutex_lock (&lock);
    if (shared_ptr == check) {
      loop = 0;
      ++stat->flips;
      shared_ptr = set;
    }
    pthread_mutex_unlock (&lock);
    update_stats(stat, gettime() - start);
  }
}

static void *tf (void *arg)
{
  int i;
  stats_t *stat = NULL;
  pthread_t tid = pthread_self();

  for (i = 0; i < THREADS; i++)
    if (tid == threads[i]) {
      stat = &stats[i];
      break;
    }

  /* Run until canceled. */
  while(1)
    fun(1, 0, stat);
  return NULL;
}

int main (int argc, char **argv) {
  int i;
  for (i = 0; i < THREADS; i++) {
    if (pthread_create (&threads[i], NULL, tf, NULL) != 0)
      {
        puts ("pthread_create failed");
        exit (1);
      }
  }

  init_stats();

  for (i = 0; i < ITER; i++)
    fun(0, 1, &stats[THREADS]);

  for (i = 0; i < THREADS; i++) {
    if (pthread_cancel (threads[i]) != 0)
      {
        puts ("pthread_cancel failed");
        exit (1);
      }
  }

  print_stats();
  return 0;
}

--- End Message ---
--- Begin Message ---
Source: glibc
Source-Version: 2.31-0experimental2
Done: Aurelien Jarno <aurel32@debian.org>

We believe that the bug you reported is fixed in the latest version of
glibc, which is due to be installed in the Debian FTP archive.

A summary of the changes between this version and the previous one is
attached.

Thank you for reporting the bug, which will now be closed.  If you
have further comments please address them to 956418@bugs.debian.org,
and the maintainer will reopen the bug report if appropriate.

Debian distribution maintenance software
pp.
Aurelien Jarno <aurel32@debian.org> (supplier of updated glibc package)

(This message was generated automatically at their request; if you
believe that there is a problem with it please contact the archive
administrators by mailing ftpmaster@ftp-master.debian.org)


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512

Format: 1.8
Date: Tue, 19 May 2020 00:19:13 +0200
Source: glibc
Architecture: source
Version: 2.31-0experimental2
Distribution: experimental
Urgency: medium
Maintainer: GNU Libc Maintainers <debian-glibc@lists.debian.org>
Changed-By: Aurelien Jarno <aurel32@debian.org>
Closes: 956418
Changes:
 glibc (2.31-0experimental2) experimental; urgency=medium
 .
   [ Aurelien Jarno ]
   * Add an explicit dependency on $(stamp)build_libc for the build-indep
     target. Currently the build is made during the binary-indep target
     instead.
   * debian/control.in/main: build-depends on gcc-10 (>= 10-20200431) on arm64
     to ensure that -moutline-atomics is enabled by default.  Closes: #956418.
   * debian/patches/git-updates.diff: update from upstream stable branch.
   * debian/debhelper.in/libc.NEWS: add an entry explaining the new trust-ad
     option in resolv.conf.
   * debian/patches/riscv64/local-asin-acos-raise-invalid.diff: new patch to
     workaround a GCC 10 bug on riscv64.
 .
   [ Samuel Thibault ]
   * debian/patches/hurd-i386/git-tst-udp.diff: New patch to fix
     sunrpc/tst-udp-* failures.
   * debian/sysdeps/hurd-i386.mk: Add -march=i686 to fix math issues until gcc
     is fixed to switch to i686 as was actually expected already.
   * debian/testsuite-xfail-debian.mk: Update hurd-i386 results.
Checksums-Sha1:
 6a6e40855dfdf228a99345069b01ae6eb40a2de5 8204 glibc_2.31-0experimental2.dsc
 8a883af347166d668050c272137f0137dffd4198 813300 glibc_2.31-0experimental2.debian.tar.xz
 cf1e025ae12268b5e137fa9754aa558af63cc9e4 7557 glibc_2.31-0experimental2_source.buildinfo
Checksums-Sha256:
 579aff7f1df7bb5f857578f480be2b467c679eaf2f3b554acf445cfc4750ce06 8204 glibc_2.31-0experimental2.dsc
 1462d983b57dbbbc32ac1cc271248849b77a0ff4aac2ce71e72de81f5b440245 813300 glibc_2.31-0experimental2.debian.tar.xz
 852a11165eee01d3e873f1515f3060f84781d574372d62b0be68f93a018744af 7557 glibc_2.31-0experimental2_source.buildinfo
Files:
 0684c17e0fce3c8c833267d1c621edc4 8204 libs required glibc_2.31-0experimental2.dsc
 aa45f20450479c5f5e04a04552843840 813300 libs required glibc_2.31-0experimental2.debian.tar.xz
 98178ff1ba22af2b0a767aa14c2cff2f 7557 libs required glibc_2.31-0experimental2_source.buildinfo

-----BEGIN PGP SIGNATURE-----

iQIzBAEBCgAdFiEEUryGlb40+QrX1Ay4E4jA+JnoM2sFAl7DCtYACgkQE4jA+Jno
M2t4sQ//Wrvl+g8fWlkViF/clkTmYCDFrlxwCYiAO09AQKBZn9UEfp4LIBnoldAx
cpmsjed14bijwGOuxAWJc5nLpnSSmBXMgTpOeo1fDv8RhNngsHK5sKDwszISWZCJ
/TW2xnHo6radjMg4EXC1t4CB1iJh1L3lPDZy6Lx/qqerq6P2dk68TqrFPRekCjFV
abwQdXkF2ekrfVPL0r7mQAW4YzTFeJUhIidw6jfthJ/Jqh1i7tz1L+3dpruGEU8t
PwuZdG4HLAWDP1IIc2BGFy+vBwdSF36Hd64U5PYTgzbQ4i0Vh6sFK0fvYq280ntC
7LUh4dTVUfHdYg6B8HNBl8lJ5kxY4wRsn+QQo1Wo4GU3gueyc7AaKw4Fb50EnTRo
0i1h6drVsuAfTjYFxnQeLvXowDgm9WLaXEmgccxT7V4qZnPYCllVnRWOZQlAr9sX
iGfaUSnTB+2VaXQQADEFDeAxzzizh5h43n3v9G3P58/CgPdZ/r+OpLuT2xv5n9cK
DS4HbUvgVJ3xmJ6FdpwBCPKSH9rFG0OWKj0NEkt0lypkITfWPZqXJiIRwvclcXYw
k0Bw101ooLMVtTRxjWjjuiIJ9iNy8iivxjmFelTpA8UFMosXuKv8Npx8WOq2krzA
WjUP6FB1hR9GvnluN6w+j8hUinold2CFtG6uGo1b4zBygqK0YWY=
=W0fD
-----END PGP SIGNATURE-----

--- End Message ---

Reply to: