Bug#1076831: bookworm-pu: package glibc/2.36-9+deb12u8

To: Debian Bug Tracking System <submit@bugs.debian.org>
Subject: Bug#1076831: bookworm-pu: package glibc/2.36-9+deb12u8
From: Aurelien Jarno <aurel32@debian.org>
Date: Tue, 23 Jul 2024 22:50:20 +0200
Message-id: <[🔎] 172176782031.1459574.12020739984204084989.reportbug@ohm.local>
Reply-to: Aurelien Jarno <aurel32@debian.org>, 1076831@bugs.debian.org

Package: release.debian.org
Severity: normal
Tags: bookworm
X-Debbugs-Cc: glibc@packages.debian.org
Control: affects -1 + src:glibc
User: release.debian.org@packages.debian.org
Usertags: pu

[ Reason ]
The upstream glibc stable branch got a fixes since the last stable
updates. This hasn't been missed in the last point release, so the
number of fixes is slightly higher than usual.

[ Impact ]
In case the update isn't approved, systems will be left with a few
issues, and the differences with upstream will increase.

[ Tests ]
The upstream fixes come with additional tests, which represent a
significant part of the diff.

[ Risks ]
The changes to do not affect critical part of the library, and come with
additional tests. The changes are already in testing/sid and in other
distributions.

[ Checklist ]
  [x] *all* changes are documented in the d/changelog
  [x] I reviewed all changes and I approve them
  [x] attach debdiff against the package in (old)stable
  [x] the issue is verified as fixed in unstable

[ Changes ]
All the changes come from the upstream stable branch, and are summarized
in the Debian changelog:

 * debian/patches/git-updates.diff: update from upstream stable branch:
   - debian/patches/kfreebsd/submitted-auxv.diff: refreshed.
   - debian/patches/any/local-CVE-2024-2961-iso-2022-cn-ext.diff: upstreamed.
   - debian/patches/any/local-CVE-2024-33599-nscd.diff: upstreamed.
   - debian/patches/any/local-CVE-2024-33600-nscd.diff: upstreamed.
   - debian/patches/any/local-CVE-2024-33601-33602-nscd.diff: upstreamed.
   - Fixes ffsll() performance issue depending on code alignment.
   - Fixes memmove/memset on sparc32.
   - Fixes pthread_cancel on sparc32.
   - Fixes a possible crash in _dl_start_user on arm32.
   - Fixes poor malloc/free performance due to lock contentions between
     threads when using core pinning.
   - Uses 64-bit time_t in testsuite on 32-bit systems.
   - Fixes rseq support when built against newer kernel headers.
   - Performance improvements for string functions on arm64.
   - Disables arm64 SVE functions on kernel <= 6.2.0 due to performance
     issues.
   - Fixes ld.so crash on powerpc64* when built with GCC 14.
   - Fixes ld.so crash on amd64 when built with APX enabled.
   - Fixes __WORDSIZE definition on sparc32 with sparcv9.
   - Fixes getutxent() on 32-bit architecture with _TIME_BITS=64.
   - Fixes y2038 regression in nscd following CVE-2024-33601 and
     CVE-2024-33602 fix.
   - Fixes build with --enable-hardcoded-path-in-tests with newer linkers.
   - Fixes crash in wcsncmp() in z13/vector-optimized s390 implementation.
   - Fixes rseq extension mechanism.
   - Fixes misc/tst-preadvwritev2 and misc/tst-preadvwritev64v2 with kernel
     6.9+.
   - Fixes freeing uninitialized memory in libc_freeres_fn().  Closes:
     #1073916.

Many changes are not relevant for Debian Bookworm as they concern a port
architecture, fix issues with different toolchain version or with
different configure options. That said it is easier to pull the whole
changes from upstream. Among the important changes, there is a y2038
regression fix in nscd following the latest security update, general
performance issue with multithreading (e.g. using OpenMP), performance
issues on arm64 and amd64, rseq fixes and a crash on s390x with some
CPU.

[ Other info ]
None

commit e0351e4b2b6b6da058ce36662c57bad799f4af2f
Author: Aurelien Jarno <aurelien@aurel32.net>
Date:   Mon Jul 22 22:14:14 2024 +0200

    debian/patches/git-updates.diff: update from upstream stable branch:
    
    * debian/patches/git-updates.diff: update from upstream stable branch:
      - debian/patches/kfreebsd/submitted-auxv.diff: refreshed.
      - debian/patches/any/local-CVE-2024-2961-iso-2022-cn-ext.diff: upstreamed.
      - debian/patches/any/local-CVE-2024-33599-nscd.diff: upstreamed.
      - debian/patches/any/local-CVE-2024-33600-nscd.diff: upstreamed.
      - debian/patches/any/local-CVE-2024-33601-33602-nscd.diff: upstreamed.
      - Fixes ffsll() performance issue depending on code alignment.
      - Fixes memmove/memset on sparc32.
      - Fixes pthread_cancel on sparc32.
      - Fixes a possible crash in _dl_start_user on arm32.
      - Fixes poor malloc/free performance due to lock contentions between
        threads when using core pinning.
      - Uses 64-bit time_t in testsuite on 32-bit systems.
      - Fixes rseq support when built against newer kernel headers.
      - Performance improvements for string functions on arm64.
      - Disables arm64 SVE functions on kernel <= 6.2.0 due to performance
        issues.
      - Fixes ld.so crash on powerpc64* when built with GCC 14.
      - Fixes ld.so crash on amd64 when built with APX enabled.
      - Fixes __WORDSIZE definition on sparc32 with sparcv9.
      - Fixes getutxent() on 32-bit architecture with _TIME_BITS=64.
      - Fixes y2038 regression in nscd following CVE-2024-33601 and
        CVE-2024-33602 fix.
      - Fixes build with --enable-hardcoded-path-in-tests with newer linkers.
      - Fixes crash in wcsncmp() in z13/vector-optimized s390 implementation.
      - Fixes rseq extension mechanism.
      - Fixes misc/tst-preadvwritev2 and misc/tst-preadvwritev64v2 with kernel
        6.9+.
      - Fixes freeing uninitialized memory in libc_freeres_fn().  Closes:
        #1073916.

diff --git a/debian/changelog b/debian/changelog
index 508118be..4072c2ba 100644
--- a/debian/changelog
+++ b/debian/changelog
@@ -1,3 +1,38 @@
+glibc (2.36-9+deb12u8) UNRELEASED; urgency=medium
+
+  * debian/patches/git-updates.diff: update from upstream stable branch:
+    - debian/patches/kfreebsd/submitted-auxv.diff: refreshed.
+    - debian/patches/any/local-CVE-2024-2961-iso-2022-cn-ext.diff: upstreamed.
+    - debian/patches/any/local-CVE-2024-33599-nscd.diff: upstreamed.
+    - debian/patches/any/local-CVE-2024-33600-nscd.diff: upstreamed.
+    - debian/patches/any/local-CVE-2024-33601-33602-nscd.diff: upstreamed.
+    - Fixes ffsll() performance issue depending on code alignment.
+    - Fixes memmove/memset on sparc32.
+    - Fixes pthread_cancel on sparc32.
+    - Fixes a possible crash in _dl_start_user on arm32.
+    - Fixes poor malloc/free performance due to lock contentions between
+      threads when using core pinning.
+    - Uses 64-bit time_t in testsuite on 32-bit systems.
+    - Fixes rseq support when built against newer kernel headers.
+    - Performance improvements for string functions on arm64.
+    - Disables arm64 SVE functions on kernel <= 6.2.0 due to performance
+      issues.
+    - Fixes ld.so crash on powerpc64* when built with GCC 14.
+    - Fixes ld.so crash on amd64 when built with APX enabled.
+    - Fixes __WORDSIZE definition on sparc32 with sparcv9.
+    - Fixes getutxent() on 32-bit architecture with _TIME_BITS=64.
+    - Fixes y2038 regression in nscd following CVE-2024-33601 and
+      CVE-2024-33602 fix.
+    - Fixes build with --enable-hardcoded-path-in-tests with newer linkers.
+    - Fixes crash in wcsncmp() in z13/vector-optimized s390 implementation.
+    - Fixes rseq extension mechanism.
+    - Fixes misc/tst-preadvwritev2 and misc/tst-preadvwritev64v2 with kernel
+      6.9+.
+    - Fixes freeing uninitialized memory in libc_freeres_fn().  Closes:
+      #1073916.
+
+ -- Aurelien Jarno <aurel32@debian.org>  Mon, 22 Jul 2024 20:05:02 +0200
+
 glibc (2.36-9+deb12u7) bookworm-security; urgency=medium
 
   * debian/patches/local-CVE-2024-33599-nscd.diff: Fix a stack-based buffer
diff --git a/debian/patches/any/local-CVE-2024-2961-iso-2022-cn-ext.diff b/debian/patches/any/local-CVE-2024-2961-iso-2022-cn-ext.diff
deleted file mode 100644
index 2d017b6f..00000000
--- a/debian/patches/any/local-CVE-2024-2961-iso-2022-cn-ext.diff
+++ /dev/null
@@ -1,207 +0,0 @@
-commit 4ed98540a7fd19f458287e783ae59c41e64df7b5
-Author: Charles Fol <folcharles@gmail.com>
-Date:   Thu Mar 28 12:25:38 2024 -0300
-
-    iconv: ISO-2022-CN-EXT: fix out-of-bound writes when writing escape sequence (CVE-2024-2961)
-    
-    ISO-2022-CN-EXT uses escape sequences to indicate character set changes
-    (as specified by RFC 1922).  While the SOdesignation has the expected
-    bounds checks, neither SS2designation nor SS3designation have its;
-    allowing a write overflow of 1, 2, or 3 bytes with fixed values:
-    '$+I', '$+J', '$+K', '$+L', '$+M', or '$*H'.
-    
-    Checked on aarch64-linux-gnu.
-    
-    Co-authored-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
-    Reviewed-by: Carlos O'Donell <carlos@redhat.com>
-    Tested-by: Carlos O'Donell <carlos@redhat.com>
-    
-    (cherry picked from commit f9dc609e06b1136bb0408be9605ce7973a767ada)
-
-diff --git a/iconvdata/Makefile b/iconvdata/Makefile
-index f4c089ed5d..d01b3fcab6 100644
---- a/iconvdata/Makefile
-+++ b/iconvdata/Makefile
-@@ -75,7 +75,8 @@ ifeq (yes,$(build-shared))
- tests = bug-iconv1 bug-iconv2 tst-loading tst-e2big tst-iconv4 bug-iconv4 \
- 	tst-iconv6 bug-iconv5 bug-iconv6 tst-iconv7 bug-iconv8 bug-iconv9 \
- 	bug-iconv10 bug-iconv11 bug-iconv12 tst-iconv-big5-hkscs-to-2ucs4 \
--	bug-iconv13 bug-iconv14 bug-iconv15
-+	bug-iconv13 bug-iconv14 bug-iconv15 \
-+	tst-iconv-iso-2022-cn-ext
- ifeq ($(have-thread-library),yes)
- tests += bug-iconv3
- endif
-@@ -330,6 +331,8 @@ $(objpfx)bug-iconv14.out: $(addprefix $(objpfx), $(gconv-modules)) \
- 			  $(addprefix $(objpfx),$(modules.so))
- $(objpfx)bug-iconv15.out: $(addprefix $(objpfx), $(gconv-modules)) \
- 			  $(addprefix $(objpfx),$(modules.so))
-+$(objpfx)tst-iconv-iso-2022-cn-ext.out: $(addprefix $(objpfx), $(gconv-modules)) \
-+					$(addprefix $(objpfx),$(modules.so))
- 
- $(objpfx)iconv-test.out: run-iconv-test.sh \
- 			 $(addprefix $(objpfx), $(gconv-modules)) \
-diff --git a/iconvdata/iso-2022-cn-ext.c b/iconvdata/iso-2022-cn-ext.c
-index e09f358cad..2cc478a8c6 100644
---- a/iconvdata/iso-2022-cn-ext.c
-+++ b/iconvdata/iso-2022-cn-ext.c
-@@ -574,6 +574,12 @@ DIAG_IGNORE_Os_NEEDS_COMMENT (5, "-Wmaybe-uninitialized");
- 	      {								      \
- 		const char *escseq;					      \
- 									      \
-+		if (outptr + 4 > outend)				      \
-+		  {							      \
-+		    result = __GCONV_FULL_OUTPUT;			      \
-+		    break;						      \
-+		  }							      \
-+									      \
- 		assert (used == CNS11643_2_set); /* XXX */		      \
- 		escseq = "*H";						      \
- 		*outptr++ = ESC;					      \
-@@ -587,6 +593,12 @@ DIAG_IGNORE_Os_NEEDS_COMMENT (5, "-Wmaybe-uninitialized");
- 	      {								      \
- 		const char *escseq;					      \
- 									      \
-+		if (outptr + 4 > outend)				      \
-+		  {							      \
-+		    result = __GCONV_FULL_OUTPUT;			      \
-+		    break;						      \
-+		  }							      \
-+									      \
- 		assert ((used >> 5) >= 3 && (used >> 5) <= 7);		      \
- 		escseq = "+I+J+K+L+M" + ((used >> 5) - 3) * 2;		      \
- 		*outptr++ = ESC;					      \
-diff --git a/iconvdata/tst-iconv-iso-2022-cn-ext.c b/iconvdata/tst-iconv-iso-2022-cn-ext.c
-new file mode 100644
-index 0000000000..96a8765fd5
---- /dev/null
-+++ b/iconvdata/tst-iconv-iso-2022-cn-ext.c
-@@ -0,0 +1,128 @@
-+/* Verify ISO-2022-CN-EXT does not write out of the bounds.
-+   Copyright (C) 2024 Free Software Foundation, Inc.
-+   This file is part of the GNU C Library.
-+
-+   The GNU C Library is free software; you can redistribute it and/or
-+   modify it under the terms of the GNU Lesser General Public
-+   License as published by the Free Software Foundation; either
-+   version 2.1 of the License, or (at your option) any later version.
-+
-+   The GNU C Library is distributed in the hope that it will be useful,
-+   but WITHOUT ANY WARRANTY; without even the implied warranty of
-+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-+   Lesser General Public License for more details.
-+
-+   You should have received a copy of the GNU Lesser General Public
-+   License along with the GNU C Library; if not, see
-+   <https://www.gnu.org/licenses/>.  */
-+
-+#include <stdio.h>
-+#include <string.h>
-+
-+#include <errno.h>
-+#include <iconv.h>
-+#include <sys/mman.h>
-+
-+#include <support/xunistd.h>
-+#include <support/check.h>
-+#include <support/support.h>
-+
-+/* The test sets up a two memory page buffer with the second page marked
-+   PROT_NONE to trigger a fault if the conversion writes beyond the exact
-+   expected amount.  Then we carry out various conversions and precisely
-+   place the start of the output buffer in order to trigger a SIGSEGV if the
-+   process writes anywhere between 1 and page sized bytes more (only one
-+   PROT_NONE page is setup as a canary) than expected.  These tests exercise
-+   all three of the cases in ISO-2022-CN-EXT where the converter must switch
-+   character sets and may run out of buffer space while doing the
-+   operation.  */
-+
-+static int
-+do_test (void)
-+{
-+  iconv_t cd = iconv_open ("ISO-2022-CN-EXT", "UTF-8");
-+  TEST_VERIFY_EXIT (cd != (iconv_t) -1);
-+
-+  char *ntf;
-+  size_t ntfsize;
-+  char *outbufbase;
-+  {
-+    int pgz = getpagesize ();
-+    TEST_VERIFY_EXIT (pgz > 0);
-+    ntfsize = 2 * pgz;
-+
-+    ntf = xmmap (NULL, ntfsize, PROT_READ | PROT_WRITE, MAP_PRIVATE
-+		 | MAP_ANONYMOUS, -1);
-+    xmprotect (ntf + pgz, pgz, PROT_NONE);
-+
-+    outbufbase = ntf + pgz;
-+  }
-+
-+  /* Check if SOdesignation escape sequence does not trigger an OOB write.  */
-+  {
-+    char inbuf[] = "\xe4\xba\xa4\xe6\x8d\xa2";
-+
-+    for (int i = 0; i < 9; i++)
-+      {
-+	char *inp = inbuf;
-+	size_t inleft = sizeof (inbuf) - 1;
-+
-+	char *outp = outbufbase - i;
-+	size_t outleft = i;
-+
-+	TEST_VERIFY_EXIT (iconv (cd, &inp, &inleft, &outp, &outleft)
-+			  == (size_t) -1);
-+	TEST_COMPARE (errno, E2BIG);
-+
-+	TEST_VERIFY_EXIT (iconv (cd, NULL, NULL, NULL, NULL) == 0);
-+      }
-+  }
-+
-+  /* Same as before for SS2designation.  */
-+  {
-+    char inbuf[] = "㴽 \xe3\xb4\xbd";
-+
-+    for (int i = 0; i < 14; i++)
-+      {
-+	char *inp = inbuf;
-+	size_t inleft = sizeof (inbuf) - 1;
-+
-+	char *outp = outbufbase - i;
-+	size_t outleft = i;
-+
-+	TEST_VERIFY_EXIT (iconv (cd, &inp, &inleft, &outp, &outleft)
-+			  == (size_t) -1);
-+	TEST_COMPARE (errno, E2BIG);
-+
-+	TEST_VERIFY_EXIT (iconv (cd, NULL, NULL, NULL, NULL) == 0);
-+      }
-+  }
-+
-+  /* Same as before for SS3designation.  */
-+  {
-+    char inbuf[] = "劄 \xe5\x8a\x84";
-+
-+    for (int i = 0; i < 14; i++)
-+      {
-+	char *inp = inbuf;
-+	size_t inleft = sizeof (inbuf) - 1;
-+
-+	char *outp = outbufbase - i;
-+	size_t outleft = i;
-+
-+	TEST_VERIFY_EXIT (iconv (cd, &inp, &inleft, &outp, &outleft)
-+			  == (size_t) -1);
-+	TEST_COMPARE (errno, E2BIG);
-+
-+	TEST_VERIFY_EXIT (iconv (cd, NULL, NULL, NULL, NULL) == 0);
-+      }
-+  }
-+
-+  TEST_VERIFY_EXIT (iconv_close (cd) != -1);
-+
-+  xmunmap (ntf, ntfsize);
-+
-+  return 0;
-+}
-+
-+#include <support/test-driver.c>
diff --git a/debian/patches/any/local-CVE-2024-33599-nscd.diff b/debian/patches/any/local-CVE-2024-33599-nscd.diff
deleted file mode 100644
index bae41afd..00000000
--- a/debian/patches/any/local-CVE-2024-33599-nscd.diff
+++ /dev/null
@@ -1,32 +0,0 @@
-commit caa3151ca460bdd9330adeedd68c3112d97bffe4
-Author: Florian Weimer <fweimer@redhat.com>
-Date:   Thu Apr 25 15:00:45 2024 +0200
-
-    CVE-2024-33599: nscd: Stack-based buffer overflow in netgroup cache (bug 31677)
-    
-    Using alloca matches what other caches do.  The request length is
-    bounded by MAXKEYLEN.
-    
-    Reviewed-by: Carlos O'Donell <carlos@redhat.com>
-    (cherry picked from commit 87801a8fd06db1d654eea3e4f7626ff476a9bdaa)
-
-diff --git a/nscd/netgroupcache.c b/nscd/netgroupcache.c
-index 85977521a6..f0de064368 100644
---- a/nscd/netgroupcache.c
-+++ b/nscd/netgroupcache.c
-@@ -502,12 +502,13 @@ addinnetgrX (struct database_dyn *db, int fd, request_header *req,
-       = (struct indataset *) mempool_alloc (db,
- 					    sizeof (*dataset) + req->key_len,
- 					    1);
--  struct indataset dataset_mem;
-   bool cacheable = true;
-   if (__glibc_unlikely (dataset == NULL))
-     {
-       cacheable = false;
--      dataset = &dataset_mem;
-+      /* The alloca is safe because nscd_run_worker verfies that
-+	 key_len is not larger than MAXKEYLEN.  */
-+      dataset = alloca (sizeof (*dataset) + req->key_len);
-     }
- 
-   datahead_init_pos (&dataset->head, sizeof (*dataset) + req->key_len,
diff --git a/debian/patches/any/local-CVE-2024-33600-nscd.diff b/debian/patches/any/local-CVE-2024-33600-nscd.diff
deleted file mode 100644
index 87ab2d1c..00000000
--- a/debian/patches/any/local-CVE-2024-33600-nscd.diff
+++ /dev/null
@@ -1,103 +0,0 @@
-commit c34f470a615b136170abd16142da5dd0c024f7d1
-Author: Florian Weimer <fweimer@redhat.com>
-Date:   Thu Apr 25 15:01:07 2024 +0200
-
-    CVE-2024-33600: nscd: Do not send missing not-found response in addgetnetgrentX (bug 31678)
-    
-    If we failed to add a not-found response to the cache, the dataset
-    point can be null, resulting in a null pointer dereference.
-    
-    Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
-    (cherry picked from commit 7835b00dbce53c3c87bbbb1754a95fb5e58187aa)
-
-commit f205b3af56740e3b014915b1bd3b162afe3407ef
-Author: Florian Weimer <fweimer@redhat.com>
-Date:   Thu Apr 25 15:01:07 2024 +0200
-
-    CVE-2024-33600: nscd: Avoid null pointer crashes after notfound response (bug 31678)
-    
-    The addgetnetgrentX call in addinnetgrX may have failed to produce
-    a result, so the result variable in addinnetgrX can be NULL.
-    Use db->negtimeout as the fallback value if there is no result data;
-    the timeout is also overwritten below.
-    
-    Also avoid sending a second not-found response.  (The client
-    disconnects after receiving the first response, so the data stream did
-    not go out of sync even without this fix.)  It is still beneficial to
-    add the negative response to the mapping, so that the client can get
-    it from there in the future, instead of going through the socket.
-    
-    Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
-    (cherry picked from commit b048a482f088e53144d26a61c390bed0210f49f2)
-
-diff --git a/nscd/netgroupcache.c b/nscd/netgroupcache.c
-index f0de064368..787e44d851 100644
---- a/nscd/netgroupcache.c
-+++ b/nscd/netgroupcache.c
-@@ -147,7 +147,7 @@ addgetnetgrentX (struct database_dyn *db, int fd, request_header *req,
-       /* No such service.  */
-       cacheable = do_notfound (db, fd, req, key, &dataset, &total, &timeout,
- 			       &key_copy);
--      goto writeout;
-+      goto maybe_cache_add;
-     }
- 
-   memset (&data, '\0', sizeof (data));
-@@ -348,7 +348,7 @@ addgetnetgrentX (struct database_dyn *db, int fd, request_header *req,
-     {
-       cacheable = do_notfound (db, fd, req, key, &dataset, &total, &timeout,
- 			       &key_copy);
--      goto writeout;
-+      goto maybe_cache_add;
-     }
- 
-   total = buffilled;
-@@ -410,14 +410,12 @@ addgetnetgrentX (struct database_dyn *db, int fd, request_header *req,
-   }
- 
-   if (he == NULL && fd != -1)
--    {
--      /* We write the dataset before inserting it to the database
--	 since while inserting this thread might block and so would
--	 unnecessarily let the receiver wait.  */
--    writeout:
-+    /* We write the dataset before inserting it to the database since
-+       while inserting this thread might block and so would
-+       unnecessarily let the receiver wait.  */
-       writeall (fd, &dataset->resp, dataset->head.recsize);
--    }
- 
-+ maybe_cache_add:
-   if (cacheable)
-     {
-       /* If necessary, we also propagate the data to disk.  */
-@@ -513,14 +511,15 @@ addinnetgrX (struct database_dyn *db, int fd, request_header *req,
- 
-   datahead_init_pos (&dataset->head, sizeof (*dataset) + req->key_len,
- 		     sizeof (innetgroup_response_header),
--		     he == NULL ? 0 : dh->nreloads + 1, result->head.ttl);
-+		     he == NULL ? 0 : dh->nreloads + 1,
-+		     result == NULL ? db->negtimeout : result->head.ttl);
-   /* Set the notfound status and timeout based on the result from
-      getnetgrent.  */
--  dataset->head.notfound = result->head.notfound;
-+  dataset->head.notfound = result == NULL || result->head.notfound;
-   dataset->head.timeout = timeout;
- 
-   dataset->resp.version = NSCD_VERSION;
--  dataset->resp.found = result->resp.found;
-+  dataset->resp.found = result != NULL && result->resp.found;
-   /* Until we find a matching entry the result is 0.  */
-   dataset->resp.result = 0;
- 
-@@ -568,7 +567,9 @@ addinnetgrX (struct database_dyn *db, int fd, request_header *req,
-       goto out;
-     }
- 
--  if (he == NULL)
-+  /* addgetnetgrentX may have already sent a notfound response.  Do
-+     not send another one.  */
-+  if (he == NULL && dataset->resp.found)
-     {
-       /* We write the dataset before inserting it to the database
- 	 since while inserting this thread might block and so would
diff --git a/debian/patches/any/local-CVE-2024-33601-33602-nscd.diff b/debian/patches/any/local-CVE-2024-33601-33602-nscd.diff
deleted file mode 100644
index 2c11fd85..00000000
--- a/debian/patches/any/local-CVE-2024-33601-33602-nscd.diff
+++ /dev/null
@@ -1,384 +0,0 @@
-commit b6742463694b1dfdd5120b91ee21cf05d15ec2e2
-Author: Florian Weimer <fweimer@redhat.com>
-Date:   Thu Apr 25 15:01:07 2024 +0200
-
-    CVE-2024-33601, CVE-2024-33602: nscd: netgroup: Use two buffers in addgetnetgrentX (bug 31680)
-    
-    This avoids potential memory corruption when the underlying NSS
-    callback function does not use the buffer space to store all strings
-    (e.g., for constant strings).
-    
-    Instead of custom buffer management, two scratch buffers are used.
-    This increases stack usage somewhat.
-    
-    Scratch buffer allocation failure is handled by return -1
-    (an invalid timeout value) instead of terminating the process.
-    This fixes bug 31679.
-    
-    Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
-    (cherry picked from commit c04a21e050d64a1193a6daab872bca2528bda44b)
-
-diff --git a/nscd/netgroupcache.c b/nscd/netgroupcache.c
-index 787e44d851..aaabbbb003 100644
---- a/nscd/netgroupcache.c
-+++ b/nscd/netgroupcache.c
-@@ -23,6 +23,7 @@
- #include <stdlib.h>
- #include <unistd.h>
- #include <sys/mman.h>
-+#include <scratch_buffer.h>
- 
- #include "../inet/netgroup.h"
- #include "nscd.h"
-@@ -65,6 +66,16 @@ struct dataset
-   char strdata[0];
- };
- 
-+/* Send a notfound response to FD.  Always returns -1 to indicate an
-+   ephemeral error.  */
-+static time_t
-+send_notfound (int fd)
-+{
-+  if (fd != -1)
-+    TEMP_FAILURE_RETRY (send (fd, &notfound, sizeof (notfound), MSG_NOSIGNAL));
-+  return -1;
-+}
-+
- /* Sends a notfound message and prepares a notfound dataset to write to the
-    cache.  Returns true if there was enough memory to allocate the dataset and
-    returns the dataset in DATASETP, total bytes to write in TOTALP and the
-@@ -83,8 +94,7 @@ do_notfound (struct database_dyn *db, int fd, request_header *req,
-   total = sizeof (notfound);
-   timeout = time (NULL) + db->negtimeout;
- 
--  if (fd != -1)
--    TEMP_FAILURE_RETRY (send (fd, &notfound, total, MSG_NOSIGNAL));
-+  send_notfound (fd);
- 
-   dataset = mempool_alloc (db, sizeof (struct dataset) + req->key_len, 1);
-   /* If we cannot permanently store the result, so be it.  */
-@@ -109,11 +119,78 @@ do_notfound (struct database_dyn *db, int fd, request_header *req,
-   return cacheable;
- }
- 
-+struct addgetnetgrentX_scratch
-+{
-+  /* This is the result that the caller should use.  It can be NULL,
-+     point into buffer, or it can be in the cache.  */
-+  struct dataset *dataset;
-+
-+  struct scratch_buffer buffer;
-+
-+  /* Used internally in addgetnetgrentX as a staging area.  */
-+  struct scratch_buffer tmp;
-+
-+  /* Number of bytes in buffer that are actually used.  */
-+  size_t buffer_used;
-+};
-+
-+static void
-+addgetnetgrentX_scratch_init (struct addgetnetgrentX_scratch *scratch)
-+{
-+  scratch->dataset = NULL;
-+  scratch_buffer_init (&scratch->buffer);
-+  scratch_buffer_init (&scratch->tmp);
-+
-+  /* Reserve space for the header.  */
-+  scratch->buffer_used = sizeof (struct dataset);
-+  static_assert (sizeof (struct dataset) < sizeof (scratch->tmp.__space),
-+		 "initial buffer space");
-+  memset (scratch->tmp.data, 0, sizeof (struct dataset));
-+}
-+
-+static void
-+addgetnetgrentX_scratch_free (struct addgetnetgrentX_scratch *scratch)
-+{
-+  scratch_buffer_free (&scratch->buffer);
-+  scratch_buffer_free (&scratch->tmp);
-+}
-+
-+/* Copy LENGTH bytes from S into SCRATCH.  Returns NULL if SCRATCH
-+   could not be resized, otherwise a pointer to the copy.  */
-+static char *
-+addgetnetgrentX_append_n (struct addgetnetgrentX_scratch *scratch,
-+			  const char *s, size_t length)
-+{
-+  while (true)
-+    {
-+      size_t remaining = scratch->buffer.length - scratch->buffer_used;
-+      if (remaining >= length)
-+	break;
-+      if (!scratch_buffer_grow_preserve (&scratch->buffer))
-+	return NULL;
-+    }
-+  char *copy = scratch->buffer.data + scratch->buffer_used;
-+  memcpy (copy, s, length);
-+  scratch->buffer_used += length;
-+  return copy;
-+}
-+
-+/* Copy S into SCRATCH, including its null terminator.  Returns false
-+   if SCRATCH could not be resized.  */
-+static bool
-+addgetnetgrentX_append (struct addgetnetgrentX_scratch *scratch, const char *s)
-+{
-+  if (s == NULL)
-+    s = "";
-+  return addgetnetgrentX_append_n (scratch, s, strlen (s) + 1) != NULL;
-+}
-+
-+/* Caller must initialize and free *SCRATCH.  If the return value is
-+   negative, this function has sent a notfound response.  */
- static time_t
- addgetnetgrentX (struct database_dyn *db, int fd, request_header *req,
- 		 const char *key, uid_t uid, struct hashentry *he,
--		 struct datahead *dh, struct dataset **resultp,
--		 void **tofreep)
-+		 struct datahead *dh, struct addgetnetgrentX_scratch *scratch)
- {
-   if (__glibc_unlikely (debug_level > 0))
-     {
-@@ -132,14 +209,10 @@ addgetnetgrentX (struct database_dyn *db, int fd, request_header *req,
- 
-   char *key_copy = NULL;
-   struct __netgrent data;
--  size_t buflen = MAX (1024, sizeof (*dataset) + req->key_len);
--  size_t buffilled = sizeof (*dataset);
--  char *buffer = NULL;
-   size_t nentries = 0;
-   size_t group_len = strlen (key) + 1;
-   struct name_list *first_needed
-     = alloca (sizeof (struct name_list) + group_len);
--  *tofreep = NULL;
- 
-   if (netgroup_database == NULL
-       && !__nss_database_get (nss_database_netgroup, &netgroup_database))
-@@ -151,8 +224,6 @@ addgetnetgrentX (struct database_dyn *db, int fd, request_header *req,
-     }
- 
-   memset (&data, '\0', sizeof (data));
--  buffer = xmalloc (buflen);
--  *tofreep = buffer;
-   first_needed->next = first_needed;
-   memcpy (first_needed->name, key, group_len);
-   data.needed_groups = first_needed;
-@@ -195,8 +266,8 @@ addgetnetgrentX (struct database_dyn *db, int fd, request_header *req,
- 		while (1)
- 		  {
- 		    int e;
--		    status = getfct.f (&data, buffer + buffilled,
--				       buflen - buffilled - req->key_len, &e);
-+		    status = getfct.f (&data, scratch->tmp.data,
-+				       scratch->tmp.length, &e);
- 		    if (status == NSS_STATUS_SUCCESS)
- 		      {
- 			if (data.type == triple_val)
-@@ -204,68 +275,10 @@ addgetnetgrentX (struct database_dyn *db, int fd, request_header *req,
- 			    const char *nhost = data.val.triple.host;
- 			    const char *nuser = data.val.triple.user;
- 			    const char *ndomain = data.val.triple.domain;
--
--			    size_t hostlen = strlen (nhost ?: "") + 1;
--			    size_t userlen = strlen (nuser ?: "") + 1;
--			    size_t domainlen = strlen (ndomain ?: "") + 1;
--
--			    if (nhost == NULL || nuser == NULL || ndomain == NULL
--				|| nhost > nuser || nuser > ndomain)
--			      {
--				const char *last = nhost;
--				if (last == NULL
--				    || (nuser != NULL && nuser > last))
--				  last = nuser;
--				if (last == NULL
--				    || (ndomain != NULL && ndomain > last))
--				  last = ndomain;
--
--				size_t bufused
--				  = (last == NULL
--				     ? buffilled
--				     : last + strlen (last) + 1 - buffer);
--
--				/* We have to make temporary copies.  */
--				size_t needed = hostlen + userlen + domainlen;
--
--				if (buflen - req->key_len - bufused < needed)
--				  {
--				    buflen += MAX (buflen, 2 * needed);
--				    /* Save offset in the old buffer.  We don't
--				       bother with the NULL check here since
--				       we'll do that later anyway.  */
--				    size_t nhostdiff = nhost - buffer;
--				    size_t nuserdiff = nuser - buffer;
--				    size_t ndomaindiff = ndomain - buffer;
--
--				    char *newbuf = xrealloc (buffer, buflen);
--				    /* Fix up the triplet pointers into the new
--				       buffer.  */
--				    nhost = (nhost ? newbuf + nhostdiff
--					     : NULL);
--				    nuser = (nuser ? newbuf + nuserdiff
--					     : NULL);
--				    ndomain = (ndomain ? newbuf + ndomaindiff
--					       : NULL);
--				    *tofreep = buffer = newbuf;
--				  }
--
--				nhost = memcpy (buffer + bufused,
--						nhost ?: "", hostlen);
--				nuser = memcpy ((char *) nhost + hostlen,
--						nuser ?: "", userlen);
--				ndomain = memcpy ((char *) nuser + userlen,
--						  ndomain ?: "", domainlen);
--			      }
--
--			    char *wp = buffer + buffilled;
--			    wp = memmove (wp, nhost ?: "", hostlen);
--			    wp += hostlen;
--			    wp = memmove (wp, nuser ?: "", userlen);
--			    wp += userlen;
--			    wp = memmove (wp, ndomain ?: "", domainlen);
--			    wp += domainlen;
--			    buffilled = wp - buffer;
-+			    if (!(addgetnetgrentX_append (scratch, nhost)
-+				  && addgetnetgrentX_append (scratch, nuser)
-+				  && addgetnetgrentX_append (scratch, ndomain)))
-+			      return send_notfound (fd);
- 			    ++nentries;
- 			  }
- 			else
-@@ -317,8 +330,8 @@ addgetnetgrentX (struct database_dyn *db, int fd, request_header *req,
- 		      }
- 		    else if (status == NSS_STATUS_TRYAGAIN && e == ERANGE)
- 		      {
--			buflen *= 2;
--			*tofreep = buffer = xrealloc (buffer, buflen);
-+			if (!scratch_buffer_grow (&scratch->tmp))
-+			  return send_notfound (fd);
- 		      }
- 		    else if (status == NSS_STATUS_RETURN
- 			     || status == NSS_STATUS_NOTFOUND
-@@ -351,10 +364,17 @@ addgetnetgrentX (struct database_dyn *db, int fd, request_header *req,
-       goto maybe_cache_add;
-     }
- 
--  total = buffilled;
-+  /* Capture the result size without the key appended.   */
-+  total = scratch->buffer_used;
-+
-+  /* Make a copy of the key.  The scratch buffer must not move after
-+     this point.  */
-+  key_copy = addgetnetgrentX_append_n (scratch, key, req->key_len);
-+  if (key_copy == NULL)
-+    return send_notfound (fd);
- 
-   /* Fill in the dataset.  */
--  dataset = (struct dataset *) buffer;
-+  dataset = scratch->buffer.data;
-   timeout = datahead_init_pos (&dataset->head, total + req->key_len,
- 			       total - offsetof (struct dataset, resp),
- 			       he == NULL ? 0 : dh->nreloads + 1,
-@@ -363,11 +383,7 @@ addgetnetgrentX (struct database_dyn *db, int fd, request_header *req,
-   dataset->resp.version = NSCD_VERSION;
-   dataset->resp.found = 1;
-   dataset->resp.nresults = nentries;
--  dataset->resp.result_len = buffilled - sizeof (*dataset);
--
--  assert (buflen - buffilled >= req->key_len);
--  key_copy = memcpy (buffer + buffilled, key, req->key_len);
--  buffilled += req->key_len;
-+  dataset->resp.result_len = total - sizeof (*dataset);
- 
-   /* Now we can determine whether on refill we have to create a new
-      record or not.  */
-@@ -398,7 +414,7 @@ addgetnetgrentX (struct database_dyn *db, int fd, request_header *req,
-     if (__glibc_likely (newp != NULL))
-       {
- 	/* Adjust pointer into the memory block.  */
--	key_copy = (char *) newp + (key_copy - buffer);
-+	key_copy = (char *) newp + (key_copy - (char *) dataset);
- 
- 	dataset = memcpy (newp, dataset, total + req->key_len);
- 	cacheable = true;
-@@ -439,7 +455,7 @@ addgetnetgrentX (struct database_dyn *db, int fd, request_header *req,
-     }
- 
-  out:
--  *resultp = dataset;
-+  scratch->dataset = dataset;
- 
-   return timeout;
- }
-@@ -460,6 +476,9 @@ addinnetgrX (struct database_dyn *db, int fd, request_header *req,
-   if (user != NULL)
-     key = (char *) rawmemchr (key, '\0') + 1;
-   const char *domain = *key++ ? key : NULL;
-+  struct addgetnetgrentX_scratch scratch;
-+
-+  addgetnetgrentX_scratch_init (&scratch);
- 
-   if (__glibc_unlikely (debug_level > 0))
-     {
-@@ -475,12 +494,8 @@ addinnetgrX (struct database_dyn *db, int fd, request_header *req,
- 							    group, group_len,
- 							    db, uid);
-   time_t timeout;
--  void *tofree;
-   if (result != NULL)
--    {
--      timeout = result->head.timeout;
--      tofree = NULL;
--    }
-+    timeout = result->head.timeout;
-   else
-     {
-       request_header req_get =
-@@ -489,7 +504,10 @@ addinnetgrX (struct database_dyn *db, int fd, request_header *req,
- 	  .key_len = group_len
- 	};
-       timeout = addgetnetgrentX (db, -1, &req_get, group, uid, NULL, NULL,
--				 &result, &tofree);
-+				 &scratch);
-+      result = scratch.dataset;
-+      if (timeout < 0)
-+	goto out;
-     }
- 
-   struct indataset
-@@ -603,7 +621,7 @@ addinnetgrX (struct database_dyn *db, int fd, request_header *req,
-     }
- 
-  out:
--  free (tofree);
-+  addgetnetgrentX_scratch_free (&scratch);
-   return timeout;
- }
- 
-@@ -613,11 +631,12 @@ addgetnetgrentX_ignore (struct database_dyn *db, int fd, request_header *req,
- 			const char *key, uid_t uid, struct hashentry *he,
- 			struct datahead *dh)
- {
--  struct dataset *ignore;
--  void *tofree;
--  time_t timeout = addgetnetgrentX (db, fd, req, key, uid, he, dh,
--				    &ignore, &tofree);
--  free (tofree);
-+  struct addgetnetgrentX_scratch scratch;
-+  addgetnetgrentX_scratch_init (&scratch);
-+  time_t timeout = addgetnetgrentX (db, fd, req, key, uid, he, dh, &scratch);
-+  addgetnetgrentX_scratch_free (&scratch);
-+  if (timeout < 0)
-+    timeout = 0;
-   return timeout;
- }
- 
-@@ -661,5 +680,9 @@ readdinnetgr (struct database_dyn *db, struct hashentry *he,
-       .key_len = he->len
-     };
- 
--  return addinnetgrX (db, -1, &req, db->data + he->key, he->owner, he, dh);
-+  int timeout = addinnetgrX (db, -1, &req, db->data + he->key, he->owner,
-+			     he, dh);
-+  if (timeout < 0)
-+    timeout = 0;
-+  return timeout;
- }
diff --git a/debian/patches/git-updates.diff b/debian/patches/git-updates.diff
index f06f7672..fb8e5c02 100644
--- a/debian/patches/git-updates.diff
+++ b/debian/patches/git-updates.diff
@@ -1,7 +1,7 @@
 GIT update of https://sourceware.org/git/glibc.git/release/2.36/master from glibc-2.36
 
 diff --git a/Makeconfig b/Makeconfig
-index ba70321af1..9dd058e04b 100644
+index ba70321af1..151f542c27 100644
 --- a/Makeconfig
 +++ b/Makeconfig
 @@ -43,6 +43,22 @@ else
@@ -27,7 +27,24 @@ index ba70321af1..9dd058e04b 100644
  # Root of the sysdeps tree.
  sysdep_dir := $(..)sysdeps
  export sysdep_dir := $(sysdep_dir)
-@@ -868,7 +884,7 @@ endif
+@@ -569,10 +585,13 @@ link-libc-rpath-link = -Wl,-rpath-link=$(rpath-link)
+ # before the expansion of LDLIBS-* variables).
+ 
+ # Tests use -Wl,-rpath instead of -Wl,-rpath-link for
+-# build-hardcoded-path-in-tests.
++# build-hardcoded-path-in-tests.  Add -Wl,--disable-new-dtags to force
++# DT_RPATH instead of DT_RUNPATH which only applies to DT_NEEDED entries
++# in the executable and doesn't applies to DT_NEEDED entries in shared
++# libraries which are loaded via DT_NEEDED entries in the executable.
+ ifeq (yes,$(build-hardcoded-path-in-tests))
+-link-libc-tests-rpath-link = $(link-libc-rpath)
+-link-test-modules-rpath-link = $(link-libc-rpath)
++link-libc-tests-rpath-link = $(link-libc-rpath) -Wl,--disable-new-dtags
++link-test-modules-rpath-link = $(link-libc-rpath) -Wl,--disable-new-dtags
+ else
+ link-libc-tests-rpath-link = $(link-libc-rpath-link)
+ link-test-modules-rpath-link =
+@@ -868,7 +887,7 @@ endif
  # Use 64 bit time_t support for installed programs
  installed-modules = nonlib nscd lddlibc4 ldconfig locale_programs \
  		    iconvprogs libnss_files libnss_compat libnss_db libnss_hesiod \
@@ -36,7 +53,7 @@ index ba70321af1..9dd058e04b 100644
  +extra-time-flags = $(if $(filter $(installed-modules),\
                             $(in-module)),-D_TIME_BITS=64 -D_FILE_OFFSET_BITS=64)
  
-@@ -917,7 +933,7 @@ endif
+@@ -917,7 +936,7 @@ endif
  # umpteen zillion filenames along with it (we use `...' instead)
  # but we don't want this echoing done when the user has said
  # he doesn't want to see commands echoed by using -s.
@@ -68,10 +85,10 @@ index d1e139d03c..09c0cf8357 100644
  else	   					# -s
  verbose	:=
 diff --git a/NEWS b/NEWS
-index f61e521fc8..0f0ebce3f0 100644
+index f61e521fc8..f6ae9e2337 100644
 --- a/NEWS
 +++ b/NEWS
-@@ -5,6 +5,94 @@ See the end for copying conditions.
+@@ -5,6 +5,100 @@ See the end for copying conditions.
  Please send GNU C library bug reports via <https://sourceware.org/bugzilla/>
  using `glibc' in the "product" field.
  
@@ -84,6 +101,11 @@ index f61e521fc8..0f0ebce3f0 100644
 +  configured on the current host i.e. as-if you had not passed
 +  AI_ADDRCONFIG to getaddrinfo calls.
 +
++Deprecated and removed features, and other changes affecting compatibility:
++
++* __rseq_size now denotes the size of the active rseq area (20 bytes
++  initially), not the size of struct rseq (32 bytes initially).
++
 +Security related changes:
 +
 +  CVE-2022-39046: When the syslog function is passed a crafted input
@@ -162,6 +184,7 @@ index f61e521fc8..0f0ebce3f0 100644
 +  [30843] potential use-after-free in getcanonname (CVE-2023-4806)
 +  [31184] FAIL: elf/tst-tlsgap
 +  [31185] Incorrect thread point access in _dl_tlsdesc_undefweak and _dl_tlsdesc_dynamic
++  [31965] rseq extension mechanism does not work as intended
 +
  Version 2.36
  
@@ -229,6 +252,22 @@ index 2b99dea33b..aac8c49b00 100644
    return __cmsg;
  }
  #endif	/* Use `extern inline'.  */
+diff --git a/bits/wordsize.h b/bits/wordsize.h
+index 14edae3a11..53013a9275 100644
+--- a/bits/wordsize.h
++++ b/bits/wordsize.h
+@@ -21,7 +21,9 @@
+ #define __WORDSIZE32_PTRDIFF_LONG
+ 
+ /* Set to 1 in order to force time types to be 32 bits instead of 64 bits in
+-   struct lastlog and struct utmp{,x} on 64-bit ports.  This may be done in
++   struct lastlog and struct utmp{,x}.  This may be done in
+    order to make 64-bit ports compatible with 32-bit ports.  Set to 0 for
+-   64-bit ports where the time types are 64-bits or for any 32-bit ports.  */
++   64-bit ports where the time types are 64-bits and new 32-bit ports
++   where time_t is 64 bits, and there is no companion architecture with
++   32-bit time_t.  */
+ #define __WORDSIZE_TIME64_COMPAT32
 diff --git a/csu/libc-start.c b/csu/libc-start.c
 index 543560f36c..bfeee6d851 100644
 --- a/csu/libc-start.c
@@ -312,7 +351,7 @@ index 2696dde4b1..9b07b4e132 100644
  
  void *
 diff --git a/elf/Makefile b/elf/Makefile
-index fd77d0c7c8..30c9af1de9 100644
+index fd77d0c7c8..cea9c1b29d 100644
 --- a/elf/Makefile
 +++ b/elf/Makefile
 @@ -53,6 +53,7 @@ routines = \
@@ -323,7 +362,15 @@ index fd77d0c7c8..30c9af1de9 100644
    dl-close \
    dl-debug \
    dl-debug-symbols \
-@@ -374,6 +375,8 @@ tests += \
+@@ -176,6 +177,7 @@ CFLAGS-.op += $(call elide-stack-protector,.op,$(elide-routines.os))
+ CFLAGS-.os += $(call elide-stack-protector,.os,$(all-rtld-routines))
+ 
+ # Add the requested compiler flags to the early startup code.
++CFLAGS-dl-misc.os += $(rtld-early-cflags)
+ CFLAGS-dl-printf.os += $(rtld-early-cflags)
+ CFLAGS-dl-setup_hash.os += $(rtld-early-cflags)
+ CFLAGS-dl-sysdep.os += $(rtld-early-cflags)
+@@ -374,6 +376,8 @@ tests += \
    tst-align \
    tst-align2 \
    tst-align3 \
@@ -332,7 +379,7 @@ index fd77d0c7c8..30c9af1de9 100644
    tst-audit1 \
    tst-audit2 \
    tst-audit8 \
-@@ -408,6 +411,7 @@ tests += \
+@@ -408,6 +412,7 @@ tests += \
    tst-dlmopen4 \
    tst-dlmopen-dlerror \
    tst-dlmopen-gethostbyname \
@@ -340,7 +387,7 @@ index fd77d0c7c8..30c9af1de9 100644
    tst-dlopenfail \
    tst-dlopenfail-2 \
    tst-dlopenrpath \
-@@ -631,6 +635,7 @@ ifeq ($(run-built-tests),yes)
+@@ -631,6 +636,7 @@ ifeq ($(run-built-tests),yes)
  tests-special += \
    $(objpfx)noload-mem.out \
    $(objpfx)tst-ldconfig-X.out \
@@ -348,7 +395,7 @@ index fd77d0c7c8..30c9af1de9 100644
    $(objpfx)tst-leaks1-mem.out \
    $(objpfx)tst-rtld-help.out \
    # tests-special
-@@ -765,6 +770,8 @@ modules-names += \
+@@ -765,6 +771,8 @@ modules-names += \
    tst-alignmod3 \
    tst-array2dep \
    tst-array5dep \
@@ -357,7 +404,7 @@ index fd77d0c7c8..30c9af1de9 100644
    tst-audit11mod1 \
    tst-audit11mod2 \
    tst-audit12mod1 \
-@@ -798,6 +805,7 @@ modules-names += \
+@@ -798,6 +806,7 @@ modules-names += \
    tst-auditmanymod7 \
    tst-auditmanymod8 \
    tst-auditmanymod9 \
@@ -365,7 +412,7 @@ index fd77d0c7c8..30c9af1de9 100644
    tst-auditmod1 \
    tst-auditmod9a \
    tst-auditmod9b \
-@@ -834,6 +842,8 @@ modules-names += \
+@@ -834,6 +843,8 @@ modules-names += \
    tst-dlmopen1mod \
    tst-dlmopen-dlerror-mod \
    tst-dlmopen-gethostbyname-mod \
@@ -374,7 +421,7 @@ index fd77d0c7c8..30c9af1de9 100644
    tst-dlopenfaillinkmod \
    tst-dlopenfailmod1 \
    tst-dlopenfailmod2 \
-@@ -990,23 +1000,8 @@ modules-names += tst-gnu2-tls1mod
+@@ -990,23 +1001,8 @@ modules-names += tst-gnu2-tls1mod
  $(objpfx)tst-gnu2-tls1: $(objpfx)tst-gnu2-tls1mod.so
  tst-gnu2-tls1mod.so-no-z-defs = yes
  CFLAGS-tst-gnu2-tls1mod.c += -mtls-dialect=gnu2
@@ -399,7 +446,7 @@ index fd77d0c7c8..30c9af1de9 100644
  ifeq (yes,$(have-protected-data))
  modules-names += tst-protected1moda tst-protected1modb
  tests += tst-protected1a tst-protected1b
-@@ -2410,6 +2405,11 @@ $(objpfx)tst-ldconfig-X.out : tst-ldconfig-X.sh $(objpfx)ldconfig
+@@ -2410,6 +2406,11 @@ $(objpfx)tst-ldconfig-X.out : tst-ldconfig-X.sh $(objpfx)ldconfig
  		 '$(run-program-env)' > $@; \
  	$(evaluate-test)
  
@@ -411,7 +458,7 @@ index fd77d0c7c8..30c9af1de9 100644
  # Test static linking of all the libraries we can possibly link
  # together.  Note that in some configurations this may be less than the
  # complete list of libraries we build but we try to maxmimize this list.
-@@ -2967,3 +2967,25 @@ $(objpfx)tst-tls-allocation-failure-static-patched.out: \
+@@ -2967,3 +2968,25 @@ $(objpfx)tst-tls-allocation-failure-static-patched.out: \
  	grep -q '^Fatal glibc error: Cannot allocate TLS block$$' $@ \
  	  && grep -q '^status: 127$$' $@; \
  	  $(evaluate-test)
@@ -987,10 +1034,20 @@ index 5f7f18ef27..4bf9052db1 100644
 +output(glibc.rtld.dynamic_sort=1): {+a[a2>a1>a>];+b[b1>b>];-b[<b<b1];+c[c>];%c(a1());}<a<a2<c<a1
 +output(glibc.rtld.dynamic_sort=2): {+a[a2>a1>a>];+b[b1>b>];-b[<b<b1];+c[c>];%c(a1());}<a2<a<c<a1
 diff --git a/elf/elf.h b/elf/elf.h
-index 02a1b3f52f..014393f3cc 100644
+index 02a1b3f52f..f34d4ef7f4 100644
 --- a/elf/elf.h
 +++ b/elf/elf.h
-@@ -4085,8 +4085,11 @@ enum
+@@ -1215,6 +1215,9 @@ typedef struct
+ #define AT_HWCAP2	26		/* More machine-dependent hints about
+ 					   processor capabilities.  */
+ 
++#define AT_RSEQ_FEATURE_SIZE	27	/* rseq supported feature size.  */
++#define AT_RSEQ_ALIGN	28		/* rseq allocation alignment.  */
++
+ #define AT_EXECFN	31		/* Filename of executable.  */
+ 
+ /* Pointer to the global system page used for system calls and other
+@@ -4085,8 +4088,11 @@ enum
  #define R_NDS32_TLS_DESC	119
  
  /* LoongArch ELF Flags */
@@ -1004,6 +1061,89 @@ index 02a1b3f52f..014393f3cc 100644
  
  /* LoongArch specific dynamic relocations */
  #define R_LARCH_NONE		0
+diff --git a/elf/ifuncmain1.c b/elf/ifuncmain1.c
+index 747fc02648..6effce3d77 100644
+--- a/elf/ifuncmain1.c
++++ b/elf/ifuncmain1.c
+@@ -19,7 +19,14 @@ typedef int (*foo_p) (void);
+ #endif
+ 
+ foo_p foo_ptr = foo;
++
++/* Address-significant access to protected symbols is not supported in
++   position-dependent mode on several architectures because GCC
++   generates relocations that assume that the address is local to the
++   main program.  */
++#ifdef __PIE__
+ foo_p foo_procted_ptr = foo_protected;
++#endif
+ 
+ extern foo_p get_foo_p (void);
+ extern foo_p get_foo_hidden_p (void);
+@@ -37,12 +44,16 @@ main (void)
+   if ((*foo_ptr) () != -1)
+     abort ();
+ 
++#ifdef __PIE__
+   if (foo_procted_ptr != foo_protected)
+     abort ();
++#endif
+   if (foo_protected () != 0)
+     abort ();
++#ifdef __PIE__
+   if ((*foo_procted_ptr) () != 0)
+     abort ();
++#endif
+ 
+   p = get_foo_p ();
+   if (p != foo)
+@@ -55,8 +66,10 @@ main (void)
+     abort ();
+ 
+   p = get_foo_protected_p ();
++#ifdef __PIE__
+   if (p != foo_protected)
+     abort ();
++#endif
+   if (ret_foo_protected != 0 || (*p) () != ret_foo_protected)
+     abort ();
+ 
+diff --git a/elf/ifuncmain5.c b/elf/ifuncmain5.c
+index f398085cb4..6fda768fb6 100644
+--- a/elf/ifuncmain5.c
++++ b/elf/ifuncmain5.c
+@@ -14,12 +14,19 @@ get_foo (void)
+   return foo;
+ }
+ 
++
++/* Address-significant access to protected symbols is not supported in
++   position-dependent mode on several architectures because GCC
++   generates relocations that assume that the address is local to the
++   main program.  */
++#ifdef __PIE__
+ foo_p
+ __attribute__ ((noinline))
+ get_foo_protected (void)
+ {
+   return foo_protected;
+ }
++#endif
+ 
+ int
+ main (void)
+@@ -30,9 +37,11 @@ main (void)
+   if ((*p) () != -1)
+     abort ();
+ 
++#ifdef __PIE__
+   p = get_foo_protected ();
+   if ((*p) () != 0)
+     abort ();
++#endif
+ 
+   return 0;
+ }
 diff --git a/elf/rtld-Rules b/elf/rtld-Rules
 index ca00dd1fe2..3c5e273f2b 100644
 --- a/elf/rtld-Rules
@@ -1901,6 +2041,193 @@ index debb96b322..b72933b526 100644
  		found |= read_conf_file (conf, dir, dir_len);
  
  	      free (conf);
+diff --git a/iconvdata/Makefile b/iconvdata/Makefile
+index f4c089ed5d..d01b3fcab6 100644
+--- a/iconvdata/Makefile
++++ b/iconvdata/Makefile
+@@ -75,7 +75,8 @@ ifeq (yes,$(build-shared))
+ tests = bug-iconv1 bug-iconv2 tst-loading tst-e2big tst-iconv4 bug-iconv4 \
+ 	tst-iconv6 bug-iconv5 bug-iconv6 tst-iconv7 bug-iconv8 bug-iconv9 \
+ 	bug-iconv10 bug-iconv11 bug-iconv12 tst-iconv-big5-hkscs-to-2ucs4 \
+-	bug-iconv13 bug-iconv14 bug-iconv15
++	bug-iconv13 bug-iconv14 bug-iconv15 \
++	tst-iconv-iso-2022-cn-ext
+ ifeq ($(have-thread-library),yes)
+ tests += bug-iconv3
+ endif
+@@ -330,6 +331,8 @@ $(objpfx)bug-iconv14.out: $(addprefix $(objpfx), $(gconv-modules)) \
+ 			  $(addprefix $(objpfx),$(modules.so))
+ $(objpfx)bug-iconv15.out: $(addprefix $(objpfx), $(gconv-modules)) \
+ 			  $(addprefix $(objpfx),$(modules.so))
++$(objpfx)tst-iconv-iso-2022-cn-ext.out: $(addprefix $(objpfx), $(gconv-modules)) \
++					$(addprefix $(objpfx),$(modules.so))
+ 
+ $(objpfx)iconv-test.out: run-iconv-test.sh \
+ 			 $(addprefix $(objpfx), $(gconv-modules)) \
+diff --git a/iconvdata/iso-2022-cn-ext.c b/iconvdata/iso-2022-cn-ext.c
+index e09f358cad..2cc478a8c6 100644
+--- a/iconvdata/iso-2022-cn-ext.c
++++ b/iconvdata/iso-2022-cn-ext.c
+@@ -574,6 +574,12 @@ DIAG_IGNORE_Os_NEEDS_COMMENT (5, "-Wmaybe-uninitialized");
+ 	      {								      \
+ 		const char *escseq;					      \
+ 									      \
++		if (outptr + 4 > outend)				      \
++		  {							      \
++		    result = __GCONV_FULL_OUTPUT;			      \
++		    break;						      \
++		  }							      \
++									      \
+ 		assert (used == CNS11643_2_set); /* XXX */		      \
+ 		escseq = "*H";						      \
+ 		*outptr++ = ESC;					      \
+@@ -587,6 +593,12 @@ DIAG_IGNORE_Os_NEEDS_COMMENT (5, "-Wmaybe-uninitialized");
+ 	      {								      \
+ 		const char *escseq;					      \
+ 									      \
++		if (outptr + 4 > outend)				      \
++		  {							      \
++		    result = __GCONV_FULL_OUTPUT;			      \
++		    break;						      \
++		  }							      \
++									      \
+ 		assert ((used >> 5) >= 3 && (used >> 5) <= 7);		      \
+ 		escseq = "+I+J+K+L+M" + ((used >> 5) - 3) * 2;		      \
+ 		*outptr++ = ESC;					      \
+diff --git a/iconvdata/tst-iconv-iso-2022-cn-ext.c b/iconvdata/tst-iconv-iso-2022-cn-ext.c
+new file mode 100644
+index 0000000000..96a8765fd5
+--- /dev/null
++++ b/iconvdata/tst-iconv-iso-2022-cn-ext.c
+@@ -0,0 +1,128 @@
++/* Verify ISO-2022-CN-EXT does not write out of the bounds.
++   Copyright (C) 2024 Free Software Foundation, Inc.
++   This file is part of the GNU C Library.
++
++   The GNU C Library is free software; you can redistribute it and/or
++   modify it under the terms of the GNU Lesser General Public
++   License as published by the Free Software Foundation; either
++   version 2.1 of the License, or (at your option) any later version.
++
++   The GNU C Library is distributed in the hope that it will be useful,
++   but WITHOUT ANY WARRANTY; without even the implied warranty of
++   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
++   Lesser General Public License for more details.
++
++   You should have received a copy of the GNU Lesser General Public
++   License along with the GNU C Library; if not, see
++   <https://www.gnu.org/licenses/>.  */
++
++#include <stdio.h>
++#include <string.h>
++
++#include <errno.h>
++#include <iconv.h>
++#include <sys/mman.h>
++
++#include <support/xunistd.h>
++#include <support/check.h>
++#include <support/support.h>
++
++/* The test sets up a two memory page buffer with the second page marked
++   PROT_NONE to trigger a fault if the conversion writes beyond the exact
++   expected amount.  Then we carry out various conversions and precisely
++   place the start of the output buffer in order to trigger a SIGSEGV if the
++   process writes anywhere between 1 and page sized bytes more (only one
++   PROT_NONE page is setup as a canary) than expected.  These tests exercise
++   all three of the cases in ISO-2022-CN-EXT where the converter must switch
++   character sets and may run out of buffer space while doing the
++   operation.  */
++
++static int
++do_test (void)
++{
++  iconv_t cd = iconv_open ("ISO-2022-CN-EXT", "UTF-8");
++  TEST_VERIFY_EXIT (cd != (iconv_t) -1);
++
++  char *ntf;
++  size_t ntfsize;
++  char *outbufbase;
++  {
++    int pgz = getpagesize ();
++    TEST_VERIFY_EXIT (pgz > 0);
++    ntfsize = 2 * pgz;
++
++    ntf = xmmap (NULL, ntfsize, PROT_READ | PROT_WRITE, MAP_PRIVATE
++		 | MAP_ANONYMOUS, -1);
++    xmprotect (ntf + pgz, pgz, PROT_NONE);
++
++    outbufbase = ntf + pgz;
++  }
++
++  /* Check if SOdesignation escape sequence does not trigger an OOB write.  */
++  {
++    char inbuf[] = "\xe4\xba\xa4\xe6\x8d\xa2";
++
++    for (int i = 0; i < 9; i++)
++      {
++	char *inp = inbuf;
++	size_t inleft = sizeof (inbuf) - 1;
++
++	char *outp = outbufbase - i;
++	size_t outleft = i;
++
++	TEST_VERIFY_EXIT (iconv (cd, &inp, &inleft, &outp, &outleft)
++			  == (size_t) -1);
++	TEST_COMPARE (errno, E2BIG);
++
++	TEST_VERIFY_EXIT (iconv (cd, NULL, NULL, NULL, NULL) == 0);
++      }
++  }
++
++  /* Same as before for SS2designation.  */
++  {
++    char inbuf[] = "㴽 \xe3\xb4\xbd";
++
++    for (int i = 0; i < 14; i++)
++      {
++	char *inp = inbuf;
++	size_t inleft = sizeof (inbuf) - 1;
++
++	char *outp = outbufbase - i;
++	size_t outleft = i;
++
++	TEST_VERIFY_EXIT (iconv (cd, &inp, &inleft, &outp, &outleft)
++			  == (size_t) -1);
++	TEST_COMPARE (errno, E2BIG);
++
++	TEST_VERIFY_EXIT (iconv (cd, NULL, NULL, NULL, NULL) == 0);
++      }
++  }
++
++  /* Same as before for SS3designation.  */
++  {
++    char inbuf[] = "劄 \xe5\x8a\x84";
++
++    for (int i = 0; i < 14; i++)
++      {
++	char *inp = inbuf;
++	size_t inleft = sizeof (inbuf) - 1;
++
++	char *outp = outbufbase - i;
++	size_t outleft = i;
++
++	TEST_VERIFY_EXIT (iconv (cd, &inp, &inleft, &outp, &outleft)
++			  == (size_t) -1);
++	TEST_COMPARE (errno, E2BIG);
++
++	TEST_VERIFY_EXIT (iconv (cd, NULL, NULL, NULL, NULL) == 0);
++      }
++  }
++
++  TEST_VERIFY_EXIT (iconv_close (cd) != -1);
++
++  xmunmap (ntf, ntfsize);
++
++  return 0;
++}
++
++#include <support/test-driver.c>
 diff --git a/include/arpa/nameser.h b/include/arpa/nameser.h
 index 53f1dbc7c3..c27e7886b7 100644
 --- a/include/arpa/nameser.h
@@ -2059,6 +2386,21 @@ index 3590b6f496..4dbbac3800 100644
 +
  # endif /* _RESOLV_H_ && !_ISOMAC */
  #endif
+diff --git a/include/sys/sysinfo.h b/include/sys/sysinfo.h
+index c490561581..65742b1036 100644
+--- a/include/sys/sysinfo.h
++++ b/include/sys/sysinfo.h
+@@ -14,10 +14,6 @@ libc_hidden_proto (__get_nprocs_conf)
+ extern int __get_nprocs (void);
+ libc_hidden_proto (__get_nprocs)
+ 
+-/* Return the number of available processors which the process can
+-   be scheduled.  */
+-extern int __get_nprocs_sched (void) attribute_hidden;
+-
+ /* Return number of physical pages of memory in the system.  */
+ extern long int __get_phys_pages (void);
+ libc_hidden_proto (__get_phys_pages)
 diff --git a/io/Makefile b/io/Makefile
 index b1710407d0..b896484320 100644
 --- a/io/Makefile
@@ -2359,6 +2701,81 @@ index 8be2d220f8..4a4d5aa6b2 100644
    const unsigned char *cp;
    const unsigned char *usrc;
  
+diff --git a/login/Makefile b/login/Makefile
+index 62440499bc..0b6b962c06 100644
+--- a/login/Makefile
++++ b/login/Makefile
+@@ -44,7 +44,9 @@ subdir-dirs = programs
+ vpath %.c programs
+ 
+ tests := tst-utmp tst-utmpx tst-grantpt tst-ptsname tst-getlogin tst-updwtmpx \
+-  tst-pututxline-lockfail tst-pututxline-cache
++  tst-pututxline-lockfail tst-pututxline-cache tst-utmp-size tst-utmp-size-64
++
++CFLAGS-tst-utmp-size-64.c += -D_FILE_OFFSET_BITS=64 -D_TIME_BITS=64
+ 
+ # Empty compatibility library for old binaries.
+ extra-libs      := libutil
+diff --git a/login/tst-utmp-size-64.c b/login/tst-utmp-size-64.c
+new file mode 100644
+index 0000000000..7a581a4c12
+--- /dev/null
++++ b/login/tst-utmp-size-64.c
+@@ -0,0 +1,2 @@
++/* The on-disk layout must not change in time64 mode.  */
++#include "tst-utmp-size.c"
+diff --git a/login/tst-utmp-size.c b/login/tst-utmp-size.c
+new file mode 100644
+index 0000000000..1b7f7ff042
+--- /dev/null
++++ b/login/tst-utmp-size.c
+@@ -0,0 +1,33 @@
++/* Check expected sizes of struct utmp, struct utmpx, struct lastlog.
++   Copyright (C) 2024 Free Software Foundation, Inc.
++   This file is part of the GNU C Library.
++
++   The GNU C Library is free software; you can redistribute it and/or
++   modify it under the terms of the GNU Lesser General Public
++   License as published by the Free Software Foundation; either
++   version 2.1 of the License, or (at your option) any later version.
++
++   The GNU C Library is distributed in the hope that it will be useful,
++   but WITHOUT ANY WARRANTY; without even the implied warranty of
++   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
++   Lesser General Public License for more details.
++
++   You should have received a copy of the GNU Lesser General Public
++   License along with the GNU C Library; if not, see
++   <https://www.gnu.org/licenses/>.  */
++
++#include <utmp.h>
++#include <utmpx.h>
++#include <utmp-size.h>
++
++static int
++do_test (void)
++{
++  _Static_assert (sizeof (struct utmp) == UTMP_SIZE, "struct utmp size");
++  _Static_assert (sizeof (struct utmpx) == UTMP_SIZE, "struct utmpx size");
++  _Static_assert (sizeof (struct lastlog) == LASTLOG_SIZE,
++                  "struct lastlog size");
++  return 0;
++}
++
++#include <support/test-driver.c>
+diff --git a/malloc/arena.c b/malloc/arena.c
+index 0a684a720d..a1ee7928d3 100644
+--- a/malloc/arena.c
++++ b/malloc/arena.c
+@@ -937,7 +937,7 @@ arena_get2 (size_t size, mstate avoid_arena)
+             narenas_limit = mp_.arena_max;
+           else if (narenas > mp_.arena_test)
+             {
+-              int n = __get_nprocs_sched ();
++              int n = __get_nprocs ();
+ 
+               if (n >= 1)
+                 narenas_limit = NARENAS_FROM_NCORES (n);
 diff --git a/misc/Makefile b/misc/Makefile
 index ba8232a0e9..66e9ded8f9 100644
 --- a/misc/Makefile
@@ -2421,6 +2838,23 @@ index fd30dd3114..916d2b6f12 100644
  __fortify_function void
  vsyslog (int __pri, const char *__fmt, __gnuc_va_list __ap)
  {
+diff --git a/misc/getsysstats.c b/misc/getsysstats.c
+index e56aff0f37..660f64eb80 100644
+--- a/misc/getsysstats.c
++++ b/misc/getsysstats.c
+@@ -44,12 +44,6 @@ weak_alias (__get_nprocs, get_nprocs)
+ link_warning (get_nprocs, "warning: get_nprocs will always return 1")
+ 
+ 
+-int
+-__get_nprocs_sched (void)
+-{
+-  return 1;
+-}
+-
+ long int
+ __get_phys_pages (void)
+ {
 diff --git a/misc/sys/cdefs.h b/misc/sys/cdefs.h
 index f525f67547..294e633335 100644
 --- a/misc/sys/cdefs.h
@@ -2624,6 +3058,23 @@ index 554089bfc4..9336036666 100644
          }
      }
  
+diff --git a/misc/tst-preadvwritev2-common.c b/misc/tst-preadvwritev2-common.c
+index 40b527bdcb..ed3dc04eeb 100644
+--- a/misc/tst-preadvwritev2-common.c
++++ b/misc/tst-preadvwritev2-common.c
+@@ -34,8 +34,11 @@
+ #ifndef RWF_APPEND
+ # define RWF_APPEND 0
+ #endif
++#ifndef RWF_NOAPPEND
++# define RWF_NOAPPEND 0
++#endif
+ #define RWF_SUPPORTED	(RWF_HIPRI | RWF_DSYNC | RWF_SYNC | RWF_NOWAIT \
+-			 | RWF_APPEND)
++			 | RWF_APPEND | RWF_NOAPPEND)
+ 
+ /* Generic uio_lim.h does not define IOV_MAX.  */
+ #ifndef IOV_MAX
 diff --git a/misc/tst-syslog-long-progname.c b/misc/tst-syslog-long-progname.c
 new file mode 100644
 index 0000000000..88f37a8a00
@@ -2995,6 +3446,51 @@ index 90187e30b1..5b9dd50151 100644
  
    if ((flags & NO_CACHE) == 0)
      *dir = nis_server_cache_search (name, search_parent, &server_used,
+diff --git a/nptl/descr.h b/nptl/descr.h
+index 5cacb286f3..ff634dac33 100644
+--- a/nptl/descr.h
++++ b/nptl/descr.h
+@@ -34,7 +34,6 @@
+ #include <bits/types/res_state.h>
+ #include <kernel-features.h>
+ #include <tls-internal-struct.h>
+-#include <sys/rseq.h>
+ #include <internal-sigset.h>
+ 
+ #ifndef TCB_ALIGNMENT
+@@ -402,14 +401,25 @@ struct pthread
+   /* Used on strsignal.  */
+   struct tls_internal_t tls_state;
+ 
+-  /* rseq area registered with the kernel.  */
+-  struct rseq rseq_area;
+-
+-  /* This member must be last.  */
+-  char end_padding[];
+-
++  /* rseq area registered with the kernel.  Use a custom definition
++     here to isolate from kernel struct rseq changes.  The
++     implementation of sched_getcpu needs acccess to the cpu_id field;
++     the other fields are unused and not included here.  */
++  union
++  {
++    struct
++    {
++      uint32_t cpu_id_start;
++      uint32_t cpu_id;
++    };
++    char pad[32];		/* Original rseq area size.  */
++  } rseq_area __attribute__ ((aligned (32)));
++
++  /* Amount of end padding, if any, in this structure.
++     This definition relies on rseq_area being last.  */
+ #define PTHREAD_STRUCT_END_PADDING \
+-  (sizeof (struct pthread) - offsetof (struct pthread, end_padding))
++  (sizeof (struct pthread) - offsetof (struct pthread, rseq_area) \
++   + sizeof ((struct pthread) {}.rseq_area))
+ } __attribute ((aligned (TCB_ALIGNMENT)));
+ 
+ static inline bool
 diff --git a/nscd/aicache.c b/nscd/aicache.c
 index 51e793199f..e0baed170b 100644
 --- a/nscd/aicache.c
@@ -3035,6 +3531,441 @@ index 61d1674eb4..531d2e83df 100644
  	  }
  # endif
  	else
+diff --git a/nscd/netgroupcache.c b/nscd/netgroupcache.c
+index 85977521a6..adc34ba6b4 100644
+--- a/nscd/netgroupcache.c
++++ b/nscd/netgroupcache.c
+@@ -23,6 +23,7 @@
+ #include <stdlib.h>
+ #include <unistd.h>
+ #include <sys/mman.h>
++#include <scratch_buffer.h>
+ 
+ #include "../inet/netgroup.h"
+ #include "nscd.h"
+@@ -65,6 +66,16 @@ struct dataset
+   char strdata[0];
+ };
+ 
++/* Send a notfound response to FD.  Always returns -1 to indicate an
++   ephemeral error.  */
++static time_t
++send_notfound (int fd)
++{
++  if (fd != -1)
++    TEMP_FAILURE_RETRY (send (fd, &notfound, sizeof (notfound), MSG_NOSIGNAL));
++  return -1;
++}
++
+ /* Sends a notfound message and prepares a notfound dataset to write to the
+    cache.  Returns true if there was enough memory to allocate the dataset and
+    returns the dataset in DATASETP, total bytes to write in TOTALP and the
+@@ -83,8 +94,7 @@ do_notfound (struct database_dyn *db, int fd, request_header *req,
+   total = sizeof (notfound);
+   timeout = time (NULL) + db->negtimeout;
+ 
+-  if (fd != -1)
+-    TEMP_FAILURE_RETRY (send (fd, &notfound, total, MSG_NOSIGNAL));
++  send_notfound (fd);
+ 
+   dataset = mempool_alloc (db, sizeof (struct dataset) + req->key_len, 1);
+   /* If we cannot permanently store the result, so be it.  */
+@@ -109,11 +119,78 @@ do_notfound (struct database_dyn *db, int fd, request_header *req,
+   return cacheable;
+ }
+ 
++struct addgetnetgrentX_scratch
++{
++  /* This is the result that the caller should use.  It can be NULL,
++     point into buffer, or it can be in the cache.  */
++  struct dataset *dataset;
++
++  struct scratch_buffer buffer;
++
++  /* Used internally in addgetnetgrentX as a staging area.  */
++  struct scratch_buffer tmp;
++
++  /* Number of bytes in buffer that are actually used.  */
++  size_t buffer_used;
++};
++
++static void
++addgetnetgrentX_scratch_init (struct addgetnetgrentX_scratch *scratch)
++{
++  scratch->dataset = NULL;
++  scratch_buffer_init (&scratch->buffer);
++  scratch_buffer_init (&scratch->tmp);
++
++  /* Reserve space for the header.  */
++  scratch->buffer_used = sizeof (struct dataset);
++  static_assert (sizeof (struct dataset) < sizeof (scratch->tmp.__space),
++		 "initial buffer space");
++  memset (scratch->tmp.data, 0, sizeof (struct dataset));
++}
++
++static void
++addgetnetgrentX_scratch_free (struct addgetnetgrentX_scratch *scratch)
++{
++  scratch_buffer_free (&scratch->buffer);
++  scratch_buffer_free (&scratch->tmp);
++}
++
++/* Copy LENGTH bytes from S into SCRATCH.  Returns NULL if SCRATCH
++   could not be resized, otherwise a pointer to the copy.  */
++static char *
++addgetnetgrentX_append_n (struct addgetnetgrentX_scratch *scratch,
++			  const char *s, size_t length)
++{
++  while (true)
++    {
++      size_t remaining = scratch->buffer.length - scratch->buffer_used;
++      if (remaining >= length)
++	break;
++      if (!scratch_buffer_grow_preserve (&scratch->buffer))
++	return NULL;
++    }
++  char *copy = scratch->buffer.data + scratch->buffer_used;
++  memcpy (copy, s, length);
++  scratch->buffer_used += length;
++  return copy;
++}
++
++/* Copy S into SCRATCH, including its null terminator.  Returns false
++   if SCRATCH could not be resized.  */
++static bool
++addgetnetgrentX_append (struct addgetnetgrentX_scratch *scratch, const char *s)
++{
++  if (s == NULL)
++    s = "";
++  return addgetnetgrentX_append_n (scratch, s, strlen (s) + 1) != NULL;
++}
++
++/* Caller must initialize and free *SCRATCH.  If the return value is
++   negative, this function has sent a notfound response.  */
+ static time_t
+ addgetnetgrentX (struct database_dyn *db, int fd, request_header *req,
+ 		 const char *key, uid_t uid, struct hashentry *he,
+-		 struct datahead *dh, struct dataset **resultp,
+-		 void **tofreep)
++		 struct datahead *dh, struct addgetnetgrentX_scratch *scratch)
+ {
+   if (__glibc_unlikely (debug_level > 0))
+     {
+@@ -132,14 +209,10 @@ addgetnetgrentX (struct database_dyn *db, int fd, request_header *req,
+ 
+   char *key_copy = NULL;
+   struct __netgrent data;
+-  size_t buflen = MAX (1024, sizeof (*dataset) + req->key_len);
+-  size_t buffilled = sizeof (*dataset);
+-  char *buffer = NULL;
+   size_t nentries = 0;
+   size_t group_len = strlen (key) + 1;
+   struct name_list *first_needed
+     = alloca (sizeof (struct name_list) + group_len);
+-  *tofreep = NULL;
+ 
+   if (netgroup_database == NULL
+       && !__nss_database_get (nss_database_netgroup, &netgroup_database))
+@@ -147,12 +220,10 @@ addgetnetgrentX (struct database_dyn *db, int fd, request_header *req,
+       /* No such service.  */
+       cacheable = do_notfound (db, fd, req, key, &dataset, &total, &timeout,
+ 			       &key_copy);
+-      goto writeout;
++      goto maybe_cache_add;
+     }
+ 
+   memset (&data, '\0', sizeof (data));
+-  buffer = xmalloc (buflen);
+-  *tofreep = buffer;
+   first_needed->next = first_needed;
+   memcpy (first_needed->name, key, group_len);
+   data.needed_groups = first_needed;
+@@ -195,8 +266,8 @@ addgetnetgrentX (struct database_dyn *db, int fd, request_header *req,
+ 		while (1)
+ 		  {
+ 		    int e;
+-		    status = getfct.f (&data, buffer + buffilled,
+-				       buflen - buffilled - req->key_len, &e);
++		    status = getfct.f (&data, scratch->tmp.data,
++				       scratch->tmp.length, &e);
+ 		    if (status == NSS_STATUS_SUCCESS)
+ 		      {
+ 			if (data.type == triple_val)
+@@ -204,68 +275,10 @@ addgetnetgrentX (struct database_dyn *db, int fd, request_header *req,
+ 			    const char *nhost = data.val.triple.host;
+ 			    const char *nuser = data.val.triple.user;
+ 			    const char *ndomain = data.val.triple.domain;
+-
+-			    size_t hostlen = strlen (nhost ?: "") + 1;
+-			    size_t userlen = strlen (nuser ?: "") + 1;
+-			    size_t domainlen = strlen (ndomain ?: "") + 1;
+-
+-			    if (nhost == NULL || nuser == NULL || ndomain == NULL
+-				|| nhost > nuser || nuser > ndomain)
+-			      {
+-				const char *last = nhost;
+-				if (last == NULL
+-				    || (nuser != NULL && nuser > last))
+-				  last = nuser;
+-				if (last == NULL
+-				    || (ndomain != NULL && ndomain > last))
+-				  last = ndomain;
+-
+-				size_t bufused
+-				  = (last == NULL
+-				     ? buffilled
+-				     : last + strlen (last) + 1 - buffer);
+-
+-				/* We have to make temporary copies.  */
+-				size_t needed = hostlen + userlen + domainlen;
+-
+-				if (buflen - req->key_len - bufused < needed)
+-				  {
+-				    buflen += MAX (buflen, 2 * needed);
+-				    /* Save offset in the old buffer.  We don't
+-				       bother with the NULL check here since
+-				       we'll do that later anyway.  */
+-				    size_t nhostdiff = nhost - buffer;
+-				    size_t nuserdiff = nuser - buffer;
+-				    size_t ndomaindiff = ndomain - buffer;
+-
+-				    char *newbuf = xrealloc (buffer, buflen);
+-				    /* Fix up the triplet pointers into the new
+-				       buffer.  */
+-				    nhost = (nhost ? newbuf + nhostdiff
+-					     : NULL);
+-				    nuser = (nuser ? newbuf + nuserdiff
+-					     : NULL);
+-				    ndomain = (ndomain ? newbuf + ndomaindiff
+-					       : NULL);
+-				    *tofreep = buffer = newbuf;
+-				  }
+-
+-				nhost = memcpy (buffer + bufused,
+-						nhost ?: "", hostlen);
+-				nuser = memcpy ((char *) nhost + hostlen,
+-						nuser ?: "", userlen);
+-				ndomain = memcpy ((char *) nuser + userlen,
+-						  ndomain ?: "", domainlen);
+-			      }
+-
+-			    char *wp = buffer + buffilled;
+-			    wp = memmove (wp, nhost ?: "", hostlen);
+-			    wp += hostlen;
+-			    wp = memmove (wp, nuser ?: "", userlen);
+-			    wp += userlen;
+-			    wp = memmove (wp, ndomain ?: "", domainlen);
+-			    wp += domainlen;
+-			    buffilled = wp - buffer;
++			    if (!(addgetnetgrentX_append (scratch, nhost)
++				  && addgetnetgrentX_append (scratch, nuser)
++				  && addgetnetgrentX_append (scratch, ndomain)))
++			      return send_notfound (fd);
+ 			    ++nentries;
+ 			  }
+ 			else
+@@ -317,8 +330,8 @@ addgetnetgrentX (struct database_dyn *db, int fd, request_header *req,
+ 		      }
+ 		    else if (status == NSS_STATUS_TRYAGAIN && e == ERANGE)
+ 		      {
+-			buflen *= 2;
+-			*tofreep = buffer = xrealloc (buffer, buflen);
++			if (!scratch_buffer_grow (&scratch->tmp))
++			  return send_notfound (fd);
+ 		      }
+ 		    else if (status == NSS_STATUS_RETURN
+ 			     || status == NSS_STATUS_NOTFOUND
+@@ -348,13 +361,20 @@ addgetnetgrentX (struct database_dyn *db, int fd, request_header *req,
+     {
+       cacheable = do_notfound (db, fd, req, key, &dataset, &total, &timeout,
+ 			       &key_copy);
+-      goto writeout;
++      goto maybe_cache_add;
+     }
+ 
+-  total = buffilled;
++  /* Capture the result size without the key appended.   */
++  total = scratch->buffer_used;
++
++  /* Make a copy of the key.  The scratch buffer must not move after
++     this point.  */
++  key_copy = addgetnetgrentX_append_n (scratch, key, req->key_len);
++  if (key_copy == NULL)
++    return send_notfound (fd);
+ 
+   /* Fill in the dataset.  */
+-  dataset = (struct dataset *) buffer;
++  dataset = scratch->buffer.data;
+   timeout = datahead_init_pos (&dataset->head, total + req->key_len,
+ 			       total - offsetof (struct dataset, resp),
+ 			       he == NULL ? 0 : dh->nreloads + 1,
+@@ -363,11 +383,7 @@ addgetnetgrentX (struct database_dyn *db, int fd, request_header *req,
+   dataset->resp.version = NSCD_VERSION;
+   dataset->resp.found = 1;
+   dataset->resp.nresults = nentries;
+-  dataset->resp.result_len = buffilled - sizeof (*dataset);
+-
+-  assert (buflen - buffilled >= req->key_len);
+-  key_copy = memcpy (buffer + buffilled, key, req->key_len);
+-  buffilled += req->key_len;
++  dataset->resp.result_len = total - sizeof (*dataset);
+ 
+   /* Now we can determine whether on refill we have to create a new
+      record or not.  */
+@@ -398,7 +414,7 @@ addgetnetgrentX (struct database_dyn *db, int fd, request_header *req,
+     if (__glibc_likely (newp != NULL))
+       {
+ 	/* Adjust pointer into the memory block.  */
+-	key_copy = (char *) newp + (key_copy - buffer);
++	key_copy = (char *) newp + (key_copy - (char *) dataset);
+ 
+ 	dataset = memcpy (newp, dataset, total + req->key_len);
+ 	cacheable = true;
+@@ -410,14 +426,12 @@ addgetnetgrentX (struct database_dyn *db, int fd, request_header *req,
+   }
+ 
+   if (he == NULL && fd != -1)
+-    {
+-      /* We write the dataset before inserting it to the database
+-	 since while inserting this thread might block and so would
+-	 unnecessarily let the receiver wait.  */
+-    writeout:
++    /* We write the dataset before inserting it to the database since
++       while inserting this thread might block and so would
++       unnecessarily let the receiver wait.  */
+       writeall (fd, &dataset->resp, dataset->head.recsize);
+-    }
+ 
++ maybe_cache_add:
+   if (cacheable)
+     {
+       /* If necessary, we also propagate the data to disk.  */
+@@ -441,7 +455,7 @@ addgetnetgrentX (struct database_dyn *db, int fd, request_header *req,
+     }
+ 
+  out:
+-  *resultp = dataset;
++  scratch->dataset = dataset;
+ 
+   return timeout;
+ }
+@@ -462,6 +476,9 @@ addinnetgrX (struct database_dyn *db, int fd, request_header *req,
+   if (user != NULL)
+     key = (char *) rawmemchr (key, '\0') + 1;
+   const char *domain = *key++ ? key : NULL;
++  struct addgetnetgrentX_scratch scratch;
++
++  addgetnetgrentX_scratch_init (&scratch);
+ 
+   if (__glibc_unlikely (debug_level > 0))
+     {
+@@ -477,12 +494,8 @@ addinnetgrX (struct database_dyn *db, int fd, request_header *req,
+ 							    group, group_len,
+ 							    db, uid);
+   time_t timeout;
+-  void *tofree;
+   if (result != NULL)
+-    {
+-      timeout = result->head.timeout;
+-      tofree = NULL;
+-    }
++    timeout = result->head.timeout;
+   else
+     {
+       request_header req_get =
+@@ -491,7 +504,10 @@ addinnetgrX (struct database_dyn *db, int fd, request_header *req,
+ 	  .key_len = group_len
+ 	};
+       timeout = addgetnetgrentX (db, -1, &req_get, group, uid, NULL, NULL,
+-				 &result, &tofree);
++				 &scratch);
++      result = scratch.dataset;
++      if (timeout < 0)
++	goto out;
+     }
+ 
+   struct indataset
+@@ -502,24 +518,26 @@ addinnetgrX (struct database_dyn *db, int fd, request_header *req,
+       = (struct indataset *) mempool_alloc (db,
+ 					    sizeof (*dataset) + req->key_len,
+ 					    1);
+-  struct indataset dataset_mem;
+   bool cacheable = true;
+   if (__glibc_unlikely (dataset == NULL))
+     {
+       cacheable = false;
+-      dataset = &dataset_mem;
++      /* The alloca is safe because nscd_run_worker verfies that
++	 key_len is not larger than MAXKEYLEN.  */
++      dataset = alloca (sizeof (*dataset) + req->key_len);
+     }
+ 
+   datahead_init_pos (&dataset->head, sizeof (*dataset) + req->key_len,
+ 		     sizeof (innetgroup_response_header),
+-		     he == NULL ? 0 : dh->nreloads + 1, result->head.ttl);
++		     he == NULL ? 0 : dh->nreloads + 1,
++		     result == NULL ? db->negtimeout : result->head.ttl);
+   /* Set the notfound status and timeout based on the result from
+      getnetgrent.  */
+-  dataset->head.notfound = result->head.notfound;
++  dataset->head.notfound = result == NULL || result->head.notfound;
+   dataset->head.timeout = timeout;
+ 
+   dataset->resp.version = NSCD_VERSION;
+-  dataset->resp.found = result->resp.found;
++  dataset->resp.found = result != NULL && result->resp.found;
+   /* Until we find a matching entry the result is 0.  */
+   dataset->resp.result = 0;
+ 
+@@ -567,7 +585,9 @@ addinnetgrX (struct database_dyn *db, int fd, request_header *req,
+       goto out;
+     }
+ 
+-  if (he == NULL)
++  /* addgetnetgrentX may have already sent a notfound response.  Do
++     not send another one.  */
++  if (he == NULL && dataset->resp.found)
+     {
+       /* We write the dataset before inserting it to the database
+ 	 since while inserting this thread might block and so would
+@@ -601,7 +621,7 @@ addinnetgrX (struct database_dyn *db, int fd, request_header *req,
+     }
+ 
+  out:
+-  free (tofree);
++  addgetnetgrentX_scratch_free (&scratch);
+   return timeout;
+ }
+ 
+@@ -611,11 +631,12 @@ addgetnetgrentX_ignore (struct database_dyn *db, int fd, request_header *req,
+ 			const char *key, uid_t uid, struct hashentry *he,
+ 			struct datahead *dh)
+ {
+-  struct dataset *ignore;
+-  void *tofree;
+-  time_t timeout = addgetnetgrentX (db, fd, req, key, uid, he, dh,
+-				    &ignore, &tofree);
+-  free (tofree);
++  struct addgetnetgrentX_scratch scratch;
++  addgetnetgrentX_scratch_init (&scratch);
++  time_t timeout = addgetnetgrentX (db, fd, req, key, uid, he, dh, &scratch);
++  addgetnetgrentX_scratch_free (&scratch);
++  if (timeout < 0)
++    timeout = 0;
+   return timeout;
+ }
+ 
+@@ -659,5 +680,9 @@ readdinnetgr (struct database_dyn *db, struct hashentry *he,
+       .key_len = he->len
+     };
+ 
+-  return addinnetgrX (db, -1, &req, db->data + he->key, he->owner, he, dh);
++  time_t timeout = addinnetgrX (db, -1, &req, db->data + he->key, he->owner,
++				he, dh);
++  if (timeout < 0)
++    timeout = 0;
++  return timeout;
+ }
 diff --git a/nscd/nscd.h b/nscd/nscd.h
 index 368091aef8..f15321585b 100644
 --- a/nscd/nscd.h
@@ -7041,6 +7972,19 @@ index 0000000000..9f5aebd99f
 +}
 +
 +#include <support/test-driver.c>
+diff --git a/rt/aio_misc.c b/rt/aio_misc.c
+index b4304d0a6f..5f9e52bcba 100644
+--- a/rt/aio_misc.c
++++ b/rt/aio_misc.c
+@@ -698,7 +698,7 @@ libc_freeres_fn (free_res)
+ {
+   size_t row;
+ 
+-  for (row = 0; row < pool_max_size; ++row)
++  for (row = 0; row < pool_size; ++row)
+     free (pool[row]);
+ 
+   free (pool);
 diff --git a/scripts/dso-ordering-test.py b/scripts/dso-ordering-test.py
 index 2dd6bfda18..b87cf2f809 100644
 --- a/scripts/dso-ordering-test.py
@@ -7594,7 +8538,7 @@ index bf7f0b81c4..c1d1c43e50 100644
    if (netname[i - 1] == '.')
      netname[i - 1] = '\0';
 diff --git a/support/Makefile b/support/Makefile
-index 9b50eac117..2b661a7eb8 100644
+index 9b50eac117..75b96c35f5 100644
 --- a/support/Makefile
 +++ b/support/Makefile
 @@ -32,6 +32,8 @@ libsupport-routines = \
@@ -7606,6 +8550,31 @@ index 9b50eac117..2b661a7eb8 100644
    ignore_stderr \
    next_to_fault \
    oom_error \
+@@ -237,6 +239,24 @@ CFLAGS-support_paths.c = \
+ CFLAGS-timespec.c += -fexcess-precision=standard
+ CFLAGS-timespec-time64.c += -fexcess-precision=standard
+ 
++# Ensure that general support files use 64-bit time_t
++CFLAGS-delayed_exit.c += -D_FILE_OFFSET_BITS=64 -D_TIME_BITS=64
++CFLAGS-shell-container.c += -D_FILE_OFFSET_BITS=64 -D_TIME_BITS=64
++CFLAGS-support_can_chroot.c += -D_FILE_OFFSET_BITS=64 -D_TIME_BITS=64
++CFLAGS-support_copy_file.c += -D_FILE_OFFSET_BITS=64 -D_TIME_BITS=64
++CFLAGS-support_copy_file_range.c += -D_FILE_OFFSET_BITS=64 -D_TIME_BITS=64
++CFLAGS-support_descriptor_supports_holes.c += -D_FILE_OFFSET_BITS=64 -D_TIME_BITS=64
++CFLAGS-support_descriptors.c += -D_FILE_OFFSET_BITS=64 -D_TIME_BITS=64
++CFLAGS-support_process_state.c += -D_FILE_OFFSET_BITS=64 -D_TIME_BITS=64
++CFLAGS-support_stat_nanoseconds.c += -D_FILE_OFFSET_BITS=64 -D_TIME_BITS=64
++CFLAGS-support_subprocess.c += -D_FILE_OFFSET_BITS=64 -D_TIME_BITS=64
++CFLAGS-support_test_main.c += -D_FILE_OFFSET_BITS=64 -D_TIME_BITS=64
++CFLAGS-test-container.c += -D_FILE_OFFSET_BITS=64 -D_TIME_BITS=64
++CFLAGS-xmkdirp.c += -D_FILE_OFFSET_BITS=64 -D_TIME_BITS=64
++# This is required to get an mkstemp which can create large files on some
++# 32-bit platforms.
++CFLAGS-temp_file.c += -D_FILE_OFFSET_BITS=64 -D_TIME_BITS=64
++
+ ifeq (,$(CXX))
+ LINKS_DSO_PROGRAM = links-dso-program-c
+ else
 diff --git a/support/dtotimespec-time64.c b/support/dtotimespec-time64.c
 new file mode 100644
 index 0000000000..b3d5e351e3
@@ -7696,10 +8665,19 @@ index 0000000000..cde5b4d74c
 +    }
 +}
 diff --git a/support/shell-container.c b/support/shell-container.c
-index 1c73666f0a..6698061b9b 100644
+index 1c73666f0a..019a6c47d1 100644
 --- a/support/shell-container.c
 +++ b/support/shell-container.c
-@@ -39,6 +39,7 @@
+@@ -16,8 +16,6 @@
+    License along with the GNU C Library; if not, see
+    <https://www.gnu.org/licenses/>.  */
+ 
+-#define _FILE_OFFSET_BITS 64
+-
+ #include <stdio.h>
+ #include <stdlib.h>
+ #include <string.h>
+@@ -39,6 +37,7 @@
  #include <error.h>
  
  #include <support/support.h>
@@ -7707,7 +8685,7 @@ index 1c73666f0a..6698061b9b 100644
  
  /* Design considerations
  
-@@ -171,6 +172,32 @@ kill_func (char **argv)
+@@ -171,6 +170,32 @@ kill_func (char **argv)
    return 0;
  }
  
@@ -7740,7 +8718,7 @@ index 1c73666f0a..6698061b9b 100644
  /* This is a list of all the built-in commands we understand.  */
  static struct {
    const char *name;
-@@ -181,6 +208,7 @@ static struct {
+@@ -181,6 +206,7 @@ static struct {
    { "cp", copy_func },
    { "exit", exit_func },
    { "kill", kill_func },
@@ -7748,6 +8726,66 @@ index 1c73666f0a..6698061b9b 100644
    { NULL, NULL }
  };
  
+diff --git a/support/support_can_chroot.c b/support/support_can_chroot.c
+index ca0e5f7ef4..43979f7c3f 100644
+--- a/support/support_can_chroot.c
++++ b/support/support_can_chroot.c
+@@ -29,14 +29,14 @@ static void
+ callback (void *closure)
+ {
+   int *result = closure;
+-  struct stat64 before;
++  struct stat before;
+   xstat ("/dev", &before);
+   if (chroot ("/dev") != 0)
+     {
+       *result = errno;
+       return;
+     }
+-  struct stat64 after;
++  struct stat after;
+   xstat ("/", &after);
+   TEST_VERIFY (before.st_dev == after.st_dev);
+   TEST_VERIFY (before.st_ino == after.st_ino);
+diff --git a/support/support_copy_file.c b/support/support_copy_file.c
+index 9a936b37c7..52ed90fae0 100644
+--- a/support/support_copy_file.c
++++ b/support/support_copy_file.c
+@@ -24,7 +24,7 @@
+ void
+ support_copy_file (const char *from, const char *to)
+ {
+-  struct stat64 st;
++  struct stat st;
+   xstat (from, &st);
+   int fd_from = xopen (from, O_RDONLY, 0);
+   mode_t mode = st.st_mode & 0777;
+diff --git a/support/support_descriptor_supports_holes.c b/support/support_descriptor_supports_holes.c
+index d9bcade1cf..83f02f7cf6 100644
+--- a/support/support_descriptor_supports_holes.c
++++ b/support/support_descriptor_supports_holes.c
+@@ -40,7 +40,7 @@ support_descriptor_supports_holes (int fd)
+       block_headroom = 32,
+     };
+ 
+-  struct stat64 st;
++  struct stat st;
+   xfstat (fd, &st);
+   if (!S_ISREG (st.st_mode))
+     FAIL_EXIT1 ("descriptor %d does not refer to a regular file", fd);
+diff --git a/support/test-container.c b/support/test-container.c
+index b6a1158ae1..2033985a67 100644
+--- a/support/test-container.c
++++ b/support/test-container.c
+@@ -16,8 +16,6 @@
+    License along with the GNU C Library; if not, see
+    <https://www.gnu.org/licenses/>.  */
+ 
+-#define _FILE_OFFSET_BITS 64
+-
+ #include <array_length.h>
+ #include <stdio.h>
+ #include <stdlib.h>
 diff --git a/support/timespec.h b/support/timespec.h
 index 4d2ac2737d..1bba3a6837 100644
 --- a/support/timespec.h
@@ -7770,6 +8808,61 @@ index 4d2ac2737d..1bba3a6837 100644
  #endif
  
  /* Check that the timespec on the left represents a time before the
+diff --git a/sysdeps/aarch64/configure b/sysdeps/aarch64/configure
+old mode 100644
+new mode 100755
+index bf972122b1..19d2b46cbf
+--- a/sysdeps/aarch64/configure
++++ b/sysdeps/aarch64/configure
+@@ -303,13 +303,14 @@ aarch64-variant-pcs = $libc_cv_aarch64_variant_pcs"
+ # Check if asm support armv8.2-a+sve
+ { $as_echo "$as_me:${as_lineno-$LINENO}: checking for SVE support in assembler" >&5
+ $as_echo_n "checking for SVE support in assembler... " >&6; }
+-if ${libc_cv_asm_sve+:} false; then :
++if ${libc_cv_aarch64_sve_asm+:} false; then :
+   $as_echo_n "(cached) " >&6
+ else
+   cat > conftest.s <<\EOF
+-        ptrue p0.b
++	.arch armv8.2-a+sve
++	ptrue p0.b
+ EOF
+-if { ac_try='${CC-cc} -c -march=armv8.2-a+sve conftest.s 1>&5'
++if { ac_try='${CC-cc} -c conftest.s 1>&5'
+   { { eval echo "\"\$as_me\":${as_lineno-$LINENO}: \"$ac_try\""; } >&5
+   (eval $ac_try) 2>&5
+   ac_status=$?
+@@ -321,8 +322,8 @@ else
+ fi
+ rm -f conftest*
+ fi
+-{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $libc_cv_asm_sve" >&5
+-$as_echo "$libc_cv_asm_sve" >&6; }
++{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $libc_cv_aarch64_sve_asm" >&5
++$as_echo "$libc_cv_aarch64_sve_asm" >&6; }
+ if test $libc_cv_aarch64_sve_asm = yes; then
+   $as_echo "#define HAVE_AARCH64_SVE_ASM 1" >>confdefs.h
+ 
+diff --git a/sysdeps/aarch64/configure.ac b/sysdeps/aarch64/configure.ac
+index 51253d9802..bb5adb1782 100644
+--- a/sysdeps/aarch64/configure.ac
++++ b/sysdeps/aarch64/configure.ac
+@@ -88,11 +88,12 @@ EOF
+ LIBC_CONFIG_VAR([aarch64-variant-pcs], [$libc_cv_aarch64_variant_pcs])
+ 
+ # Check if asm support armv8.2-a+sve
+-AC_CACHE_CHECK(for SVE support in assembler, libc_cv_asm_sve, [dnl
++AC_CACHE_CHECK([for SVE support in assembler], [libc_cv_aarch64_sve_asm], [dnl
+ cat > conftest.s <<\EOF
+-        ptrue p0.b
++	.arch armv8.2-a+sve
++	ptrue p0.b
+ EOF
+-if AC_TRY_COMMAND(${CC-cc} -c -march=armv8.2-a+sve conftest.s 1>&AS_MESSAGE_LOG_FD); then
++if AC_TRY_COMMAND(${CC-cc} -c conftest.s 1>&AS_MESSAGE_LOG_FD); then
+   libc_cv_aarch64_sve_asm=yes
+ else
+   libc_cv_aarch64_sve_asm=no
 diff --git a/sysdeps/aarch64/dl-trampoline.S b/sysdeps/aarch64/dl-trampoline.S
 index 909b208578..d66f0b9c45 100644
 --- a/sysdeps/aarch64/dl-trampoline.S
@@ -7796,18 +8889,3008 @@ index 909b208578..d66f0b9c45 100644
  	ldp	q0, q1, [x29, #OFFSET_RV + DL_OFFSET_RV_V0 + 32*0]
  	ldp	q2, q3, [x29, #OFFSET_RV + DL_OFFSET_RV_V0 + 32*1]
  	ldp	q4, q5, [x29, #OFFSET_RV + DL_OFFSET_RV_V0 + 32*2]
-diff --git a/sysdeps/generic/ldsodefs.h b/sysdeps/generic/ldsodefs.h
-index 050a3032de..c2627fced7 100644
---- a/sysdeps/generic/ldsodefs.h
-+++ b/sysdeps/generic/ldsodefs.h
-@@ -105,6 +105,9 @@ typedef struct link_map *lookup_t;
-    DT_PREINIT_ARRAY.  */
- typedef void (*dl_init_t) (int, char **, char **);
+diff --git a/sysdeps/aarch64/memchr.S b/sysdeps/aarch64/memchr.S
+index 2053a977b6..79aa910da4 100644
+--- a/sysdeps/aarch64/memchr.S
++++ b/sysdeps/aarch64/memchr.S
+@@ -30,7 +30,6 @@
+ # define MEMCHR __memchr
+ #endif
  
-+/* Type of a constructor function, in DT_FINI, DT_FINI_ARRAY.  */
-+typedef void (*fini_t) (void);
-+
- /* On some architectures a pointer to a function is not just a pointer
+-/* Arguments and results.  */
+ #define srcin		x0
+ #define chrin		w1
+ #define cntin		x2
+@@ -73,42 +72,44 @@ ENTRY (MEMCHR)
+ 
+ 	rbit	synd, synd
+ 	clz	synd, synd
+-	add	result, srcin, synd, lsr 2
+ 	cmp	cntin, synd, lsr 2
++	add	result, srcin, synd, lsr 2
+ 	csel	result, result, xzr, hi
+ 	ret
+ 
++	.p2align 3
+ L(start_loop):
+ 	sub	tmp, src, srcin
+-	add	tmp, tmp, 16
++	add	tmp, tmp, 17
+ 	subs	cntrem, cntin, tmp
+-	b.ls	L(nomatch)
++	b.lo	L(nomatch)
+ 
+ 	/* Make sure that it won't overread by a 16-byte chunk */
+-	add	tmp, cntrem, 15
+-	tbnz	tmp, 4, L(loop32_2)
+-
++	tbz	cntrem, 4, L(loop32_2)
++	sub	src, src, 16
+ 	.p2align 4
+ L(loop32):
+-	ldr	qdata, [src, 16]!
++	ldr	qdata, [src, 32]!
+ 	cmeq	vhas_chr.16b, vdata.16b, vrepchr.16b
+ 	umaxp	vend.16b, vhas_chr.16b, vhas_chr.16b		/* 128->64 */
+ 	fmov	synd, dend
+ 	cbnz	synd, L(end)
+ 
+ L(loop32_2):
+-	ldr	qdata, [src, 16]!
+-	subs	cntrem, cntrem, 32
++	ldr	qdata, [src, 16]
+ 	cmeq	vhas_chr.16b, vdata.16b, vrepchr.16b
+-	b.ls	L(end)
++	subs	cntrem, cntrem, 32
++	b.lo	L(end_2)
+ 	umaxp	vend.16b, vhas_chr.16b, vhas_chr.16b		/* 128->64 */
+ 	fmov	synd, dend
+ 	cbz	synd, L(loop32)
++L(end_2):
++	add	src, src, 16
+ L(end):
+ 	shrn	vend.8b, vhas_chr.8h, 4		/* 128->64 */
++	sub	cntrem, src, srcin
+ 	fmov	synd, dend
+-	add	tmp, srcin, cntin
+-	sub	cntrem, tmp, src
++	sub	cntrem, cntin, cntrem
+ #ifndef __AARCH64EB__
+ 	rbit	synd, synd
+ #endif
+diff --git a/sysdeps/aarch64/memcpy.S b/sysdeps/aarch64/memcpy.S
+index 98d4e2c0e2..7b396b202f 100644
+--- a/sysdeps/aarch64/memcpy.S
++++ b/sysdeps/aarch64/memcpy.S
+@@ -1,4 +1,5 @@
+-/* Copyright (C) 2012-2022 Free Software Foundation, Inc.
++/* Generic optimized memcpy using SIMD.
++   Copyright (C) 2012-2022 Free Software Foundation, Inc.
+ 
+    This file is part of the GNU C Library.
+ 
+@@ -20,7 +21,7 @@
+ 
+ /* Assumptions:
+  *
+- * ARMv8-a, AArch64, unaligned accesses.
++ * ARMv8-a, AArch64, Advanced SIMD, unaligned accesses.
+  *
+  */
+ 
+@@ -36,21 +37,18 @@
+ #define B_l	x8
+ #define B_lw	w8
+ #define B_h	x9
+-#define C_l	x10
+ #define C_lw	w10
+-#define C_h	x11
+-#define D_l	x12
+-#define D_h	x13
+-#define E_l	x14
+-#define E_h	x15
+-#define F_l	x16
+-#define F_h	x17
+-#define G_l	count
+-#define G_h	dst
+-#define H_l	src
+-#define H_h	srcend
+ #define tmp1	x14
+ 
++#define A_q	q0
++#define B_q	q1
++#define C_q	q2
++#define D_q	q3
++#define E_q	q4
++#define F_q	q5
++#define G_q	q6
++#define H_q	q7
++
+ #ifndef MEMMOVE
+ # define MEMMOVE memmove
+ #endif
+@@ -69,10 +67,9 @@
+    Large copies use a software pipelined loop processing 64 bytes per
+    iteration.  The destination pointer is 16-byte aligned to minimize
+    unaligned accesses.  The loop tail is handled by always copying 64 bytes
+-   from the end.
+-*/
++   from the end.  */
+ 
+-ENTRY_ALIGN (MEMCPY, 6)
++ENTRY (MEMCPY)
+ 	PTR_ARG (0)
+ 	PTR_ARG (1)
+ 	SIZE_ARG (2)
+@@ -87,10 +84,10 @@ ENTRY_ALIGN (MEMCPY, 6)
+ 	/* Small copies: 0..32 bytes.  */
+ 	cmp	count, 16
+ 	b.lo	L(copy16)
+-	ldp	A_l, A_h, [src]
+-	ldp	D_l, D_h, [srcend, -16]
+-	stp	A_l, A_h, [dstin]
+-	stp	D_l, D_h, [dstend, -16]
++	ldr	A_q, [src]
++	ldr	B_q, [srcend, -16]
++	str	A_q, [dstin]
++	str	B_q, [dstend, -16]
+ 	ret
+ 
+ 	/* Copy 8-15 bytes.  */
+@@ -102,7 +99,6 @@ L(copy16):
+ 	str	A_h, [dstend, -8]
+ 	ret
+ 
+-	.p2align 3
+ 	/* Copy 4-7 bytes.  */
+ L(copy8):
+ 	tbz	count, 2, L(copy4)
+@@ -128,87 +124,69 @@ L(copy0):
+ 	.p2align 4
+ 	/* Medium copies: 33..128 bytes.  */
+ L(copy32_128):
+-	ldp	A_l, A_h, [src]
+-	ldp	B_l, B_h, [src, 16]
+-	ldp	C_l, C_h, [srcend, -32]
+-	ldp	D_l, D_h, [srcend, -16]
++	ldp	A_q, B_q, [src]
++	ldp	C_q, D_q, [srcend, -32]
+ 	cmp	count, 64
+ 	b.hi	L(copy128)
+-	stp	A_l, A_h, [dstin]
+-	stp	B_l, B_h, [dstin, 16]
+-	stp	C_l, C_h, [dstend, -32]
+-	stp	D_l, D_h, [dstend, -16]
++	stp	A_q, B_q, [dstin]
++	stp	C_q, D_q, [dstend, -32]
+ 	ret
+ 
+ 	.p2align 4
+ 	/* Copy 65..128 bytes.  */
+ L(copy128):
+-	ldp	E_l, E_h, [src, 32]
+-	ldp	F_l, F_h, [src, 48]
++	ldp	E_q, F_q, [src, 32]
+ 	cmp	count, 96
+ 	b.ls	L(copy96)
+-	ldp	G_l, G_h, [srcend, -64]
+-	ldp	H_l, H_h, [srcend, -48]
+-	stp	G_l, G_h, [dstend, -64]
+-	stp	H_l, H_h, [dstend, -48]
++	ldp	G_q, H_q, [srcend, -64]
++	stp	G_q, H_q, [dstend, -64]
+ L(copy96):
+-	stp	A_l, A_h, [dstin]
+-	stp	B_l, B_h, [dstin, 16]
+-	stp	E_l, E_h, [dstin, 32]
+-	stp	F_l, F_h, [dstin, 48]
+-	stp	C_l, C_h, [dstend, -32]
+-	stp	D_l, D_h, [dstend, -16]
++	stp	A_q, B_q, [dstin]
++	stp	E_q, F_q, [dstin, 32]
++	stp	C_q, D_q, [dstend, -32]
+ 	ret
+ 
+-	.p2align 4
++	/* Align loop64 below to 16 bytes.  */
++	nop
++
+ 	/* Copy more than 128 bytes.  */
+ L(copy_long):
+-	/* Copy 16 bytes and then align dst to 16-byte alignment.  */
+-	ldp	D_l, D_h, [src]
+-	and	tmp1, dstin, 15
+-	bic	dst, dstin, 15
+-	sub	src, src, tmp1
++	/* Copy 16 bytes and then align src to 16-byte alignment.  */
++	ldr	D_q, [src]
++	and	tmp1, src, 15
++	bic	src, src, 15
++	sub	dst, dstin, tmp1
+ 	add	count, count, tmp1	/* Count is now 16 too large.  */
+-	ldp	A_l, A_h, [src, 16]
+-	stp	D_l, D_h, [dstin]
+-	ldp	B_l, B_h, [src, 32]
+-	ldp	C_l, C_h, [src, 48]
+-	ldp	D_l, D_h, [src, 64]!
++	ldp	A_q, B_q, [src, 16]
++	str	D_q, [dstin]
++	ldp	C_q, D_q, [src, 48]
+ 	subs	count, count, 128 + 16	/* Test and readjust count.  */
+ 	b.ls	L(copy64_from_end)
+-
+ L(loop64):
+-	stp	A_l, A_h, [dst, 16]
+-	ldp	A_l, A_h, [src, 16]
+-	stp	B_l, B_h, [dst, 32]
+-	ldp	B_l, B_h, [src, 32]
+-	stp	C_l, C_h, [dst, 48]
+-	ldp	C_l, C_h, [src, 48]
+-	stp	D_l, D_h, [dst, 64]!
+-	ldp	D_l, D_h, [src, 64]!
++	stp	A_q, B_q, [dst, 16]
++	ldp	A_q, B_q, [src, 80]
++	stp	C_q, D_q, [dst, 48]
++	ldp	C_q, D_q, [src, 112]
++	add	src, src, 64
++	add	dst, dst, 64
+ 	subs	count, count, 64
+ 	b.hi	L(loop64)
+ 
+ 	/* Write the last iteration and copy 64 bytes from the end.  */
+ L(copy64_from_end):
+-	ldp	E_l, E_h, [srcend, -64]
+-	stp	A_l, A_h, [dst, 16]
+-	ldp	A_l, A_h, [srcend, -48]
+-	stp	B_l, B_h, [dst, 32]
+-	ldp	B_l, B_h, [srcend, -32]
+-	stp	C_l, C_h, [dst, 48]
+-	ldp	C_l, C_h, [srcend, -16]
+-	stp	D_l, D_h, [dst, 64]
+-	stp	E_l, E_h, [dstend, -64]
+-	stp	A_l, A_h, [dstend, -48]
+-	stp	B_l, B_h, [dstend, -32]
+-	stp	C_l, C_h, [dstend, -16]
++	ldp	E_q, F_q, [srcend, -64]
++	stp	A_q, B_q, [dst, 16]
++	ldp	A_q, B_q, [srcend, -32]
++	stp	C_q, D_q, [dst, 48]
++	stp	E_q, F_q, [dstend, -64]
++	stp	A_q, B_q, [dstend, -32]
+ 	ret
+ 
+ END (MEMCPY)
+ libc_hidden_builtin_def (MEMCPY)
+ 
+-ENTRY_ALIGN (MEMMOVE, 4)
++
++ENTRY (MEMMOVE)
+ 	PTR_ARG (0)
+ 	PTR_ARG (1)
+ 	SIZE_ARG (2)
+@@ -220,64 +198,56 @@ ENTRY_ALIGN (MEMMOVE, 4)
+ 	cmp	count, 32
+ 	b.hi	L(copy32_128)
+ 
+-	/* Small copies: 0..32 bytes.  */
++	/* Small moves: 0..32 bytes.  */
+ 	cmp	count, 16
+ 	b.lo	L(copy16)
+-	ldp	A_l, A_h, [src]
+-	ldp	D_l, D_h, [srcend, -16]
+-	stp	A_l, A_h, [dstin]
+-	stp	D_l, D_h, [dstend, -16]
++	ldr	A_q, [src]
++	ldr	B_q, [srcend, -16]
++	str	A_q, [dstin]
++	str	B_q, [dstend, -16]
+ 	ret
+ 
+-	.p2align 4
+ L(move_long):
+ 	/* Only use backward copy if there is an overlap.  */
+ 	sub	tmp1, dstin, src
+-	cbz	tmp1, L(copy0)
++	cbz	tmp1, L(move0)
+ 	cmp	tmp1, count
+ 	b.hs	L(copy_long)
+ 
+ 	/* Large backwards copy for overlapping copies.
+-	   Copy 16 bytes and then align dst to 16-byte alignment.  */
+-	ldp	D_l, D_h, [srcend, -16]
+-	and	tmp1, dstend, 15
+-	sub	srcend, srcend, tmp1
++	   Copy 16 bytes and then align srcend to 16-byte alignment.  */
++L(copy_long_backwards):
++	ldr	D_q, [srcend, -16]
++	and	tmp1, srcend, 15
++	bic	srcend, srcend, 15
+ 	sub	count, count, tmp1
+-	ldp	A_l, A_h, [srcend, -16]
+-	stp	D_l, D_h, [dstend, -16]
+-	ldp	B_l, B_h, [srcend, -32]
+-	ldp	C_l, C_h, [srcend, -48]
+-	ldp	D_l, D_h, [srcend, -64]!
++	ldp	A_q, B_q, [srcend, -32]
++	str	D_q, [dstend, -16]
++	ldp	C_q, D_q, [srcend, -64]
+ 	sub	dstend, dstend, tmp1
+ 	subs	count, count, 128
+ 	b.ls	L(copy64_from_start)
+ 
+ L(loop64_backwards):
+-	stp	A_l, A_h, [dstend, -16]
+-	ldp	A_l, A_h, [srcend, -16]
+-	stp	B_l, B_h, [dstend, -32]
+-	ldp	B_l, B_h, [srcend, -32]
+-	stp	C_l, C_h, [dstend, -48]
+-	ldp	C_l, C_h, [srcend, -48]
+-	stp	D_l, D_h, [dstend, -64]!
+-	ldp	D_l, D_h, [srcend, -64]!
++	str	B_q, [dstend, -16]
++	str	A_q, [dstend, -32]
++	ldp	A_q, B_q, [srcend, -96]
++	str	D_q, [dstend, -48]
++	str	C_q, [dstend, -64]!
++	ldp	C_q, D_q, [srcend, -128]
++	sub	srcend, srcend, 64
+ 	subs	count, count, 64
+ 	b.hi	L(loop64_backwards)
+ 
+ 	/* Write the last iteration and copy 64 bytes from the start.  */
+ L(copy64_from_start):
+-	ldp	G_l, G_h, [src, 48]
+-	stp	A_l, A_h, [dstend, -16]
+-	ldp	A_l, A_h, [src, 32]
+-	stp	B_l, B_h, [dstend, -32]
+-	ldp	B_l, B_h, [src, 16]
+-	stp	C_l, C_h, [dstend, -48]
+-	ldp	C_l, C_h, [src]
+-	stp	D_l, D_h, [dstend, -64]
+-	stp	G_l, G_h, [dstin, 48]
+-	stp	A_l, A_h, [dstin, 32]
+-	stp	B_l, B_h, [dstin, 16]
+-	stp	C_l, C_h, [dstin]
++	ldp	E_q, F_q, [src, 32]
++	stp	A_q, B_q, [dstend, -32]
++	ldp	A_q, B_q, [src]
++	stp	C_q, D_q, [dstend, -64]
++	stp	E_q, F_q, [dstin, 32]
++	stp	A_q, B_q, [dstin]
++L(move0):
+ 	ret
+ 
+ END (MEMMOVE)
+diff --git a/sysdeps/aarch64/memrchr.S b/sysdeps/aarch64/memrchr.S
+index 5179320720..428af51f70 100644
+--- a/sysdeps/aarch64/memrchr.S
++++ b/sysdeps/aarch64/memrchr.S
+@@ -26,7 +26,6 @@
+  * MTE compatible.
+  */
+ 
+-/* Arguments and results.  */
+ #define srcin		x0
+ #define chrin		w1
+ #define cntin		x2
+@@ -77,31 +76,34 @@ ENTRY (__memrchr)
+ 	csel	result, result, xzr, hi
+ 	ret
+ 
++	nop
+ L(start_loop):
+-	sub	tmp, end, src
+-	subs	cntrem, cntin, tmp
++	subs	cntrem, src, srcin
+ 	b.ls	L(nomatch)
+ 
+ 	/* Make sure that it won't overread by a 16-byte chunk */
+-	add	tmp, cntrem, 15
+-	tbnz	tmp, 4, L(loop32_2)
++	sub	cntrem, cntrem, 1
++	tbz	cntrem, 4, L(loop32_2)
++	add	src, src, 16
+ 
+-	.p2align 4
++	.p2align 5
+ L(loop32):
+-	ldr	qdata, [src, -16]!
++	ldr	qdata, [src, -32]!
+ 	cmeq	vhas_chr.16b, vdata.16b, vrepchr.16b
+ 	umaxp	vend.16b, vhas_chr.16b, vhas_chr.16b		/* 128->64 */
+ 	fmov	synd, dend
+ 	cbnz	synd, L(end)
+ 
+ L(loop32_2):
+-	ldr	qdata, [src, -16]!
++	ldr	qdata, [src, -16]
+ 	subs	cntrem, cntrem, 32
+ 	cmeq	vhas_chr.16b, vdata.16b, vrepchr.16b
+-	b.ls	L(end)
++	b.lo	L(end_2)
+ 	umaxp	vend.16b, vhas_chr.16b, vhas_chr.16b		/* 128->64 */
+ 	fmov	synd, dend
+ 	cbz	synd, L(loop32)
++L(end_2):
++	sub	src, src, 16
+ L(end):
+ 	shrn	vend.8b, vhas_chr.8h, 4		/* 128->64 */
+ 	fmov	synd, dend
+diff --git a/sysdeps/aarch64/memset.S b/sysdeps/aarch64/memset.S
+index 957996bd19..b76d1c3e5e 100644
+--- a/sysdeps/aarch64/memset.S
++++ b/sysdeps/aarch64/memset.S
+@@ -29,7 +29,7 @@
+  *
+  */
+ 
+-ENTRY_ALIGN (MEMSET, 6)
++ENTRY (MEMSET)
+ 
+ 	PTR_ARG (0)
+ 	SIZE_ARG (2)
+@@ -101,19 +101,19 @@ L(tail64):
+ 	ret
+ 
+ L(try_zva):
+-#ifdef ZVA_MACRO
+-	zva_macro
+-#else
++#ifndef ZVA64_ONLY
+ 	.p2align 3
+ 	mrs	tmp1, dczid_el0
+ 	tbnz	tmp1w, 4, L(no_zva)
+ 	and	tmp1w, tmp1w, 15
+ 	cmp	tmp1w, 4	/* ZVA size is 64 bytes.  */
+ 	b.ne	 L(zva_128)
+-
++	nop
++#endif
+ 	/* Write the first and last 64 byte aligned block using stp rather
+ 	   than using DC ZVA.  This is faster on some cores.
+ 	 */
++	.p2align 4
+ L(zva_64):
+ 	str	q0, [dst, 16]
+ 	stp	q0, q0, [dst, 32]
+@@ -123,7 +123,6 @@ L(zva_64):
+ 	sub	count, dstend, dst	/* Count is now 128 too large.	*/
+ 	sub	count, count, 128+64+64	/* Adjust count and bias for loop.  */
+ 	add	dst, dst, 128
+-	nop
+ 1:	dc	zva, dst
+ 	add	dst, dst, 64
+ 	subs	count, count, 64
+@@ -134,6 +133,7 @@ L(zva_64):
+ 	stp	q0, q0, [dstend, -32]
+ 	ret
+ 
++#ifndef ZVA64_ONLY
+ 	.p2align 3
+ L(zva_128):
+ 	cmp	tmp1w, 5	/* ZVA size is 128 bytes.  */
+diff --git a/sysdeps/aarch64/multiarch/Makefile b/sysdeps/aarch64/multiarch/Makefile
+index 16297192ee..e4720b7468 100644
+--- a/sysdeps/aarch64/multiarch/Makefile
++++ b/sysdeps/aarch64/multiarch/Makefile
+@@ -3,18 +3,19 @@ sysdep_routines += \
+   memchr_generic \
+   memchr_nosimd \
+   memcpy_a64fx \
+-  memcpy_advsimd \
+-  memcpy_falkor \
+   memcpy_generic \
++  memcpy_mops \
+   memcpy_sve \
+   memcpy_thunderx \
+   memcpy_thunderx2 \
++  memmove_mops \
+   memset_a64fx \
+   memset_emag \
+-  memset_falkor \
+   memset_generic \
+   memset_kunpeng \
++  memset_mops \
++  memset_zva64 \
+   strlen_asimd \
+-  strlen_mte \
++  strlen_generic \
+ # sysdep_routines
+ endif
+diff --git a/sysdeps/aarch64/multiarch/ifunc-impl-list.c b/sysdeps/aarch64/multiarch/ifunc-impl-list.c
+index 4144615ab2..1c712ce913 100644
+--- a/sysdeps/aarch64/multiarch/ifunc-impl-list.c
++++ b/sysdeps/aarch64/multiarch/ifunc-impl-list.c
+@@ -36,32 +36,29 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array,
+   IFUNC_IMPL (i, name, memcpy,
+ 	      IFUNC_IMPL_ADD (array, i, memcpy, 1, __memcpy_thunderx)
+ 	      IFUNC_IMPL_ADD (array, i, memcpy, !bti, __memcpy_thunderx2)
+-	      IFUNC_IMPL_ADD (array, i, memcpy, 1, __memcpy_falkor)
+-	      IFUNC_IMPL_ADD (array, i, memcpy, 1, __memcpy_simd)
+ #if HAVE_AARCH64_SVE_ASM
+ 	      IFUNC_IMPL_ADD (array, i, memcpy, sve, __memcpy_a64fx)
+ 	      IFUNC_IMPL_ADD (array, i, memcpy, sve, __memcpy_sve)
+ #endif
++	      IFUNC_IMPL_ADD (array, i, memcpy, mops, __memcpy_mops)
+ 	      IFUNC_IMPL_ADD (array, i, memcpy, 1, __memcpy_generic))
+   IFUNC_IMPL (i, name, memmove,
+ 	      IFUNC_IMPL_ADD (array, i, memmove, 1, __memmove_thunderx)
+ 	      IFUNC_IMPL_ADD (array, i, memmove, !bti, __memmove_thunderx2)
+-	      IFUNC_IMPL_ADD (array, i, memmove, 1, __memmove_falkor)
+-	      IFUNC_IMPL_ADD (array, i, memmove, 1, __memmove_simd)
+ #if HAVE_AARCH64_SVE_ASM
+ 	      IFUNC_IMPL_ADD (array, i, memmove, sve, __memmove_a64fx)
+ 	      IFUNC_IMPL_ADD (array, i, memmove, sve, __memmove_sve)
+ #endif
++	      IFUNC_IMPL_ADD (array, i, memmove, mops, __memmove_mops)
+ 	      IFUNC_IMPL_ADD (array, i, memmove, 1, __memmove_generic))
+   IFUNC_IMPL (i, name, memset,
+-	      /* Enable this on non-falkor processors too so that other cores
+-		 can do a comparative analysis with __memset_generic.  */
+-	      IFUNC_IMPL_ADD (array, i, memset, (zva_size == 64), __memset_falkor)
+-	      IFUNC_IMPL_ADD (array, i, memset, (zva_size == 64), __memset_emag)
++	      IFUNC_IMPL_ADD (array, i, memset, (zva_size == 64), __memset_zva64)
++	      IFUNC_IMPL_ADD (array, i, memset, 1, __memset_emag)
+ 	      IFUNC_IMPL_ADD (array, i, memset, 1, __memset_kunpeng)
+ #if HAVE_AARCH64_SVE_ASM
+-	      IFUNC_IMPL_ADD (array, i, memset, sve, __memset_a64fx)
++	      IFUNC_IMPL_ADD (array, i, memset, sve && zva_size == 256, __memset_a64fx)
+ #endif
++	      IFUNC_IMPL_ADD (array, i, memset, mops, __memset_mops)
+ 	      IFUNC_IMPL_ADD (array, i, memset, 1, __memset_generic))
+   IFUNC_IMPL (i, name, memchr,
+ 	      IFUNC_IMPL_ADD (array, i, memchr, !mte, __memchr_nosimd)
+@@ -69,7 +66,7 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array,
+ 
+   IFUNC_IMPL (i, name, strlen,
+ 	      IFUNC_IMPL_ADD (array, i, strlen, !mte, __strlen_asimd)
+-	      IFUNC_IMPL_ADD (array, i, strlen, 1, __strlen_mte))
++	      IFUNC_IMPL_ADD (array, i, strlen, 1, __strlen_generic))
+ 
+   return 0;
+ }
+diff --git a/sysdeps/aarch64/multiarch/init-arch.h b/sysdeps/aarch64/multiarch/init-arch.h
+index a4dcac0019..5b2cf5cb12 100644
+--- a/sysdeps/aarch64/multiarch/init-arch.h
++++ b/sysdeps/aarch64/multiarch/init-arch.h
+@@ -35,4 +35,8 @@
+   bool __attribute__((unused)) mte =					      \
+     MTE_ENABLED ();							      \
+   bool __attribute__((unused)) sve =					      \
+-    GLRO(dl_aarch64_cpu_features).sve;
++    GLRO(dl_aarch64_cpu_features).sve;					      \
++  bool __attribute__((unused)) prefer_sve_ifuncs =			      \
++    GLRO(dl_aarch64_cpu_features).prefer_sve_ifuncs;			      \
++  bool __attribute__((unused)) mops =					      \
++    GLRO(dl_aarch64_cpu_features).mops;
+diff --git a/sysdeps/aarch64/multiarch/memchr_nosimd.S b/sysdeps/aarch64/multiarch/memchr_nosimd.S
+index ddf7533943..e39f39e6b3 100644
+--- a/sysdeps/aarch64/multiarch/memchr_nosimd.S
++++ b/sysdeps/aarch64/multiarch/memchr_nosimd.S
+@@ -26,10 +26,6 @@
+  * Use base integer registers.
+  */
+ 
+-#ifndef MEMCHR
+-# define MEMCHR __memchr_nosimd
+-#endif
+-
+ /* Arguments and results.  */
+ #define srcin		x0
+ #define chrin		x1
+@@ -62,7 +58,7 @@
+ #define REP8_7f		0x7f7f7f7f7f7f7f7f
+ 
+ 
+-ENTRY_ALIGN (MEMCHR, 6)
++ENTRY (__memchr_nosimd)
+ 
+ 	PTR_ARG (0)
+ 	SIZE_ARG (2)
+@@ -219,5 +215,4 @@ L(none_chr):
+ 	mov	result, 0
+ 	ret
+ 
+-END (MEMCHR)
+-libc_hidden_builtin_def (MEMCHR)
++END (__memchr_nosimd)
+diff --git a/sysdeps/aarch64/multiarch/memcpy.c b/sysdeps/aarch64/multiarch/memcpy.c
+index 0486213f08..3de66c14d4 100644
+--- a/sysdeps/aarch64/multiarch/memcpy.c
++++ b/sysdeps/aarch64/multiarch/memcpy.c
+@@ -29,26 +29,25 @@
+ extern __typeof (__redirect_memcpy) __libc_memcpy;
+ 
+ extern __typeof (__redirect_memcpy) __memcpy_generic attribute_hidden;
+-extern __typeof (__redirect_memcpy) __memcpy_simd attribute_hidden;
+ extern __typeof (__redirect_memcpy) __memcpy_thunderx attribute_hidden;
+ extern __typeof (__redirect_memcpy) __memcpy_thunderx2 attribute_hidden;
+-extern __typeof (__redirect_memcpy) __memcpy_falkor attribute_hidden;
+ extern __typeof (__redirect_memcpy) __memcpy_a64fx attribute_hidden;
+ extern __typeof (__redirect_memcpy) __memcpy_sve attribute_hidden;
++extern __typeof (__redirect_memcpy) __memcpy_mops attribute_hidden;
+ 
+ static inline __typeof (__redirect_memcpy) *
+ select_memcpy_ifunc (void)
+ {
+   INIT_ARCH ();
+ 
+-  if (IS_NEOVERSE_N1 (midr) || IS_NEOVERSE_N2 (midr))
+-    return __memcpy_simd;
++  if (mops)
++    return __memcpy_mops;
+ 
+   if (sve && HAVE_AARCH64_SVE_ASM)
+     {
+       if (IS_A64FX (midr))
+ 	return __memcpy_a64fx;
+-      return __memcpy_sve;
++      return prefer_sve_ifuncs ? __memcpy_sve : __memcpy_generic;
+     }
+ 
+   if (IS_THUNDERX (midr))
+@@ -57,9 +56,6 @@ select_memcpy_ifunc (void)
+   if (IS_THUNDERX2 (midr) || IS_THUNDERX2PA (midr))
+     return __memcpy_thunderx2;
+ 
+-  if (IS_FALKOR (midr) || IS_PHECDA (midr))
+-    return __memcpy_falkor;
+-
+   return __memcpy_generic;
+ }
+ 
+diff --git a/sysdeps/aarch64/multiarch/memcpy_a64fx.S b/sysdeps/aarch64/multiarch/memcpy_a64fx.S
+index c4eab06176..c254dc8b9f 100644
+--- a/sysdeps/aarch64/multiarch/memcpy_a64fx.S
++++ b/sysdeps/aarch64/multiarch/memcpy_a64fx.S
+@@ -39,9 +39,6 @@
+ #define vlen8	x8
+ 
+ #if HAVE_AARCH64_SVE_ASM
+-# if IS_IN (libc)
+-#  define MEMCPY __memcpy_a64fx
+-#  define MEMMOVE __memmove_a64fx
+ 
+ 	.arch armv8.2-a+sve
+ 
+@@ -97,7 +94,7 @@
+ #undef BTI_C
+ #define BTI_C
+ 
+-ENTRY (MEMCPY)
++ENTRY (__memcpy_a64fx)
+ 
+ 	PTR_ARG (0)
+ 	PTR_ARG (1)
+@@ -234,11 +231,10 @@ L(last_bytes):
+ 	st1b	z3.b, p0, [dstend, -1, mul vl]
+ 	ret
+ 
+-END (MEMCPY)
+-libc_hidden_builtin_def (MEMCPY)
++END (__memcpy_a64fx)
+ 
+ 
+-ENTRY_ALIGN (MEMMOVE, 4)
++ENTRY_ALIGN (__memmove_a64fx, 4)
+ 
+ 	PTR_ARG (0)
+ 	PTR_ARG (1)
+@@ -307,7 +303,5 @@ L(full_overlap):
+ 	mov	dst, dstin
+ 	b	L(last_bytes)
+ 
+-END (MEMMOVE)
+-libc_hidden_builtin_def (MEMMOVE)
+-# endif /* IS_IN (libc) */
++END (__memmove_a64fx)
+ #endif /* HAVE_AARCH64_SVE_ASM */
+diff --git a/sysdeps/aarch64/multiarch/memcpy_advsimd.S b/sysdeps/aarch64/multiarch/memcpy_advsimd.S
+deleted file mode 100644
+index fe9beaf5ea..0000000000
+--- a/sysdeps/aarch64/multiarch/memcpy_advsimd.S
++++ /dev/null
+@@ -1,248 +0,0 @@
+-/* Generic optimized memcpy using SIMD.
+-   Copyright (C) 2020-2022 Free Software Foundation, Inc.
+-
+-   This file is part of the GNU C Library.
+-
+-   The GNU C Library is free software; you can redistribute it and/or
+-   modify it under the terms of the GNU Lesser General Public
+-   License as published by the Free Software Foundation; either
+-   version 2.1 of the License, or (at your option) any later version.
+-
+-   The GNU C Library is distributed in the hope that it will be useful,
+-   but WITHOUT ANY WARRANTY; without even the implied warranty of
+-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+-   Lesser General Public License for more details.
+-
+-   You should have received a copy of the GNU Lesser General Public
+-   License along with the GNU C Library.  If not, see
+-   <https://www.gnu.org/licenses/>.  */
+-
+-#include <sysdep.h>
+-
+-/* Assumptions:
+- *
+- * ARMv8-a, AArch64, Advanced SIMD, unaligned accesses.
+- *
+- */
+-
+-#define dstin	x0
+-#define src	x1
+-#define count	x2
+-#define dst	x3
+-#define srcend	x4
+-#define dstend	x5
+-#define A_l	x6
+-#define A_lw	w6
+-#define A_h	x7
+-#define B_l	x8
+-#define B_lw	w8
+-#define B_h	x9
+-#define C_lw	w10
+-#define tmp1	x14
+-
+-#define A_q	q0
+-#define B_q	q1
+-#define C_q	q2
+-#define D_q	q3
+-#define E_q	q4
+-#define F_q	q5
+-#define G_q	q6
+-#define H_q	q7
+-
+-
+-/* This implementation supports both memcpy and memmove and shares most code.
+-   It uses unaligned accesses and branchless sequences to keep the code small,
+-   simple and improve performance.
+-
+-   Copies are split into 3 main cases: small copies of up to 32 bytes, medium
+-   copies of up to 128 bytes, and large copies.  The overhead of the overlap
+-   check in memmove is negligible since it is only required for large copies.
+-
+-   Large copies use a software pipelined loop processing 64 bytes per
+-   iteration.  The destination pointer is 16-byte aligned to minimize
+-   unaligned accesses.  The loop tail is handled by always copying 64 bytes
+-   from the end.  */
+-
+-ENTRY (__memcpy_simd)
+-	PTR_ARG (0)
+-	PTR_ARG (1)
+-	SIZE_ARG (2)
+-
+-	add	srcend, src, count
+-	add	dstend, dstin, count
+-	cmp	count, 128
+-	b.hi	L(copy_long)
+-	cmp	count, 32
+-	b.hi	L(copy32_128)
+-
+-	/* Small copies: 0..32 bytes.  */
+-	cmp	count, 16
+-	b.lo	L(copy16)
+-	ldr	A_q, [src]
+-	ldr	B_q, [srcend, -16]
+-	str	A_q, [dstin]
+-	str	B_q, [dstend, -16]
+-	ret
+-
+-	/* Copy 8-15 bytes.  */
+-L(copy16):
+-	tbz	count, 3, L(copy8)
+-	ldr	A_l, [src]
+-	ldr	A_h, [srcend, -8]
+-	str	A_l, [dstin]
+-	str	A_h, [dstend, -8]
+-	ret
+-
+-	/* Copy 4-7 bytes.  */
+-L(copy8):
+-	tbz	count, 2, L(copy4)
+-	ldr	A_lw, [src]
+-	ldr	B_lw, [srcend, -4]
+-	str	A_lw, [dstin]
+-	str	B_lw, [dstend, -4]
+-	ret
+-
+-	/* Copy 0..3 bytes using a branchless sequence.  */
+-L(copy4):
+-	cbz	count, L(copy0)
+-	lsr	tmp1, count, 1
+-	ldrb	A_lw, [src]
+-	ldrb	C_lw, [srcend, -1]
+-	ldrb	B_lw, [src, tmp1]
+-	strb	A_lw, [dstin]
+-	strb	B_lw, [dstin, tmp1]
+-	strb	C_lw, [dstend, -1]
+-L(copy0):
+-	ret
+-
+-	.p2align 4
+-	/* Medium copies: 33..128 bytes.  */
+-L(copy32_128):
+-	ldp	A_q, B_q, [src]
+-	ldp	C_q, D_q, [srcend, -32]
+-	cmp	count, 64
+-	b.hi	L(copy128)
+-	stp	A_q, B_q, [dstin]
+-	stp	C_q, D_q, [dstend, -32]
+-	ret
+-
+-	.p2align 4
+-	/* Copy 65..128 bytes.  */
+-L(copy128):
+-	ldp	E_q, F_q, [src, 32]
+-	cmp	count, 96
+-	b.ls	L(copy96)
+-	ldp	G_q, H_q, [srcend, -64]
+-	stp	G_q, H_q, [dstend, -64]
+-L(copy96):
+-	stp	A_q, B_q, [dstin]
+-	stp	E_q, F_q, [dstin, 32]
+-	stp	C_q, D_q, [dstend, -32]
+-	ret
+-
+-	/* Align loop64 below to 16 bytes.  */
+-	nop
+-
+-	/* Copy more than 128 bytes.  */
+-L(copy_long):
+-	/* Copy 16 bytes and then align src to 16-byte alignment.  */
+-	ldr	D_q, [src]
+-	and	tmp1, src, 15
+-	bic	src, src, 15
+-	sub	dst, dstin, tmp1
+-	add	count, count, tmp1	/* Count is now 16 too large.  */
+-	ldp	A_q, B_q, [src, 16]
+-	str	D_q, [dstin]
+-	ldp	C_q, D_q, [src, 48]
+-	subs	count, count, 128 + 16	/* Test and readjust count.  */
+-	b.ls	L(copy64_from_end)
+-L(loop64):
+-	stp	A_q, B_q, [dst, 16]
+-	ldp	A_q, B_q, [src, 80]
+-	stp	C_q, D_q, [dst, 48]
+-	ldp	C_q, D_q, [src, 112]
+-	add	src, src, 64
+-	add	dst, dst, 64
+-	subs	count, count, 64
+-	b.hi	L(loop64)
+-
+-	/* Write the last iteration and copy 64 bytes from the end.  */
+-L(copy64_from_end):
+-	ldp	E_q, F_q, [srcend, -64]
+-	stp	A_q, B_q, [dst, 16]
+-	ldp	A_q, B_q, [srcend, -32]
+-	stp	C_q, D_q, [dst, 48]
+-	stp	E_q, F_q, [dstend, -64]
+-	stp	A_q, B_q, [dstend, -32]
+-	ret
+-
+-END (__memcpy_simd)
+-libc_hidden_builtin_def (__memcpy_simd)
+-
+-
+-ENTRY (__memmove_simd)
+-	PTR_ARG (0)
+-	PTR_ARG (1)
+-	SIZE_ARG (2)
+-
+-	add	srcend, src, count
+-	add	dstend, dstin, count
+-	cmp	count, 128
+-	b.hi	L(move_long)
+-	cmp	count, 32
+-	b.hi	L(copy32_128)
+-
+-	/* Small moves: 0..32 bytes.  */
+-	cmp	count, 16
+-	b.lo	L(copy16)
+-	ldr	A_q, [src]
+-	ldr	B_q, [srcend, -16]
+-	str	A_q, [dstin]
+-	str	B_q, [dstend, -16]
+-	ret
+-
+-L(move_long):
+-	/* Only use backward copy if there is an overlap.  */
+-	sub	tmp1, dstin, src
+-	cbz	tmp1, L(move0)
+-	cmp	tmp1, count
+-	b.hs	L(copy_long)
+-
+-	/* Large backwards copy for overlapping copies.
+-	   Copy 16 bytes and then align srcend to 16-byte alignment.  */
+-L(copy_long_backwards):
+-	ldr	D_q, [srcend, -16]
+-	and	tmp1, srcend, 15
+-	bic	srcend, srcend, 15
+-	sub	count, count, tmp1
+-	ldp	A_q, B_q, [srcend, -32]
+-	str	D_q, [dstend, -16]
+-	ldp	C_q, D_q, [srcend, -64]
+-	sub	dstend, dstend, tmp1
+-	subs	count, count, 128
+-	b.ls	L(copy64_from_start)
+-
+-L(loop64_backwards):
+-	str	B_q, [dstend, -16]
+-	str	A_q, [dstend, -32]
+-	ldp	A_q, B_q, [srcend, -96]
+-	str	D_q, [dstend, -48]
+-	str	C_q, [dstend, -64]!
+-	ldp	C_q, D_q, [srcend, -128]
+-	sub	srcend, srcend, 64
+-	subs	count, count, 64
+-	b.hi	L(loop64_backwards)
+-
+-	/* Write the last iteration and copy 64 bytes from the start.  */
+-L(copy64_from_start):
+-	ldp	E_q, F_q, [src, 32]
+-	stp	A_q, B_q, [dstend, -32]
+-	ldp	A_q, B_q, [src]
+-	stp	C_q, D_q, [dstend, -64]
+-	stp	E_q, F_q, [dstin, 32]
+-	stp	A_q, B_q, [dstin]
+-L(move0):
+-	ret
+-
+-END (__memmove_simd)
+-libc_hidden_builtin_def (__memmove_simd)
+diff --git a/sysdeps/aarch64/multiarch/memcpy_falkor.S b/sysdeps/aarch64/multiarch/memcpy_falkor.S
+deleted file mode 100644
+index 117edd9cfc..0000000000
+--- a/sysdeps/aarch64/multiarch/memcpy_falkor.S
++++ /dev/null
+@@ -1,315 +0,0 @@
+-/* Optimized memcpy for Qualcomm Falkor processor.
+-   Copyright (C) 2017-2022 Free Software Foundation, Inc.
+-
+-   This file is part of the GNU C Library.
+-
+-   The GNU C Library is free software; you can redistribute it and/or
+-   modify it under the terms of the GNU Lesser General Public
+-   License as published by the Free Software Foundation; either
+-   version 2.1 of the License, or (at your option) any later version.
+-
+-   The GNU C Library is distributed in the hope that it will be useful,
+-   but WITHOUT ANY WARRANTY; without even the implied warranty of
+-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+-   Lesser General Public License for more details.
+-
+-   You should have received a copy of the GNU Lesser General Public
+-   License along with the GNU C Library.  If not, see
+-   <https://www.gnu.org/licenses/>.  */
+-
+-#include <sysdep.h>
+-
+-/* Assumptions:
+-
+-   ARMv8-a, AArch64, falkor, unaligned accesses.  */
+-
+-#define dstin	x0
+-#define src	x1
+-#define count	x2
+-#define dst	x3
+-#define srcend	x4
+-#define dstend	x5
+-#define tmp1	x14
+-#define A_x	x6
+-#define B_x	x7
+-#define A_w	w6
+-#define B_w	w7
+-
+-#define A_q	q0
+-#define B_q	q1
+-#define C_q	q2
+-#define D_q	q3
+-#define E_q	q4
+-#define F_q	q5
+-#define G_q	q6
+-#define H_q	q7
+-#define Q_q	q6
+-#define S_q	q22
+-
+-/* Copies are split into 3 main cases:
+-
+-   1. Small copies of up to 32 bytes
+-   2. Medium copies of 33..128 bytes which are fully unrolled
+-   3. Large copies of more than 128 bytes.
+-
+-   Large copies align the source to a quad word and use an unrolled loop
+-   processing 64 bytes per iteration.
+-
+-   FALKOR-SPECIFIC DESIGN:
+-
+-   The smallest copies (32 bytes or less) focus on optimal pipeline usage,
+-   which is why the redundant copies of 0-3 bytes have been replaced with
+-   conditionals, since the former would unnecessarily break across multiple
+-   issue groups.  The medium copy group has been enlarged to 128 bytes since
+-   bumping up the small copies up to 32 bytes allows us to do that without
+-   cost and also allows us to reduce the size of the prep code before loop64.
+-
+-   The copy loop uses only one register q0.  This is to ensure that all loads
+-   hit a single hardware prefetcher which can get correctly trained to prefetch
+-   a single stream.
+-
+-   The non-temporal stores help optimize cache utilization.  */
+-
+-#if IS_IN (libc)
+-ENTRY_ALIGN (__memcpy_falkor, 6)
+-
+-	PTR_ARG (0)
+-	PTR_ARG (1)
+-	SIZE_ARG (2)
+-
+-	cmp	count, 32
+-	add	srcend, src, count
+-	add	dstend, dstin, count
+-	b.ls	L(copy32)
+-	cmp	count, 128
+-	b.hi	L(copy_long)
+-
+-	/* Medium copies: 33..128 bytes.  */
+-L(copy128):
+-	sub	tmp1, count, 1
+-	ldr	A_q, [src]
+-	ldr	B_q, [src, 16]
+-	ldr	C_q, [srcend, -32]
+-	ldr	D_q, [srcend, -16]
+-	tbz	tmp1, 6, 1f
+-	ldr	E_q, [src, 32]
+-	ldr	F_q, [src, 48]
+-	ldr	G_q, [srcend, -64]
+-	ldr	H_q, [srcend, -48]
+-	str	G_q, [dstend, -64]
+-	str	H_q, [dstend, -48]
+-	str	E_q, [dstin, 32]
+-	str	F_q, [dstin, 48]
+-1:
+-	str	A_q, [dstin]
+-	str	B_q, [dstin, 16]
+-	str	C_q, [dstend, -32]
+-	str	D_q, [dstend, -16]
+-	ret
+-
+-	.p2align 4
+-	/* Small copies: 0..32 bytes.  */
+-L(copy32):
+-	/* 16-32 */
+-	cmp	count, 16
+-	b.lo	1f
+-	ldr	A_q, [src]
+-	ldr	B_q, [srcend, -16]
+-	str	A_q, [dstin]
+-	str	B_q, [dstend, -16]
+-	ret
+-	.p2align 4
+-1:
+-	/* 8-15 */
+-	tbz	count, 3, 1f
+-	ldr	A_x, [src]
+-	ldr	B_x, [srcend, -8]
+-	str	A_x, [dstin]
+-	str	B_x, [dstend, -8]
+-	ret
+-	.p2align 4
+-1:
+-	/* 4-7 */
+-	tbz	count, 2, 1f
+-	ldr	A_w, [src]
+-	ldr	B_w, [srcend, -4]
+-	str	A_w, [dstin]
+-	str	B_w, [dstend, -4]
+-	ret
+-	.p2align 4
+-1:
+-	/* 2-3 */
+-	tbz	count, 1, 1f
+-	ldrh	A_w, [src]
+-	ldrh	B_w, [srcend, -2]
+-	strh	A_w, [dstin]
+-	strh	B_w, [dstend, -2]
+-	ret
+-	.p2align 4
+-1:
+-	/* 0-1 */
+-	tbz	count, 0, 1f
+-	ldrb	A_w, [src]
+-	strb	A_w, [dstin]
+-1:
+-	ret
+-
+-	/* Align SRC to 16 bytes and copy; that way at least one of the
+-	   accesses is aligned throughout the copy sequence.
+-
+-	   The count is off by 0 to 15 bytes, but this is OK because we trim
+-	   off the last 64 bytes to copy off from the end.  Due to this the
+-	   loop never runs out of bounds.  */
+-
+-	.p2align 4
+-	nop		/* Align loop64 below.  */
+-L(copy_long):
+-	ldr	A_q, [src]
+-	sub	count, count, 64 + 16
+-	and	tmp1, src, 15
+-	str	A_q, [dstin]
+-	bic	src, src, 15
+-	sub	dst, dstin, tmp1
+-	add	count, count, tmp1
+-
+-L(loop64):
+-	ldr	A_q, [src, 16]!
+-	str	A_q, [dst, 16]
+-	ldr	A_q, [src, 16]!
+-	subs	count, count, 64
+-	str	A_q, [dst, 32]
+-	ldr	A_q, [src, 16]!
+-	str	A_q, [dst, 48]
+-	ldr	A_q, [src, 16]!
+-	str	A_q, [dst, 64]!
+-	b.hi	L(loop64)
+-
+-	/* Write the last full set of 64 bytes.  The remainder is at most 64
+-	   bytes, so it is safe to always copy 64 bytes from the end even if
+-	   there is just 1 byte left.  */
+-	ldr	E_q, [srcend, -64]
+-	str	E_q, [dstend, -64]
+-	ldr	D_q, [srcend, -48]
+-	str	D_q, [dstend, -48]
+-	ldr	C_q, [srcend, -32]
+-	str	C_q, [dstend, -32]
+-	ldr	B_q, [srcend, -16]
+-	str	B_q, [dstend, -16]
+-	ret
+-
+-END (__memcpy_falkor)
+-libc_hidden_builtin_def (__memcpy_falkor)
+-
+-
+-/* RATIONALE:
+-
+-   The move has 4 distinct parts:
+-   * Small moves of 32 bytes and under.
+-   * Medium sized moves of 33-128 bytes (fully unrolled).
+-   * Large moves where the source address is higher than the destination
+-     (forward copies)
+-   * Large moves where the destination address is higher than the source
+-     (copy backward, or move).
+-
+-   We use only two registers q6 and q22 for the moves and move 32 bytes at a
+-   time to correctly train the hardware prefetcher for better throughput.
+-
+-   For small and medium cases memcpy is used.  */
+-
+-ENTRY_ALIGN (__memmove_falkor, 6)
+-
+-	PTR_ARG (0)
+-	PTR_ARG (1)
+-	SIZE_ARG (2)
+-
+-	cmp	count, 32
+-	add	srcend, src, count
+-	add	dstend, dstin, count
+-	b.ls	L(copy32)
+-	cmp	count, 128
+-	b.ls	L(copy128)
+-	sub	tmp1, dstin, src
+-	ccmp	tmp1, count, 2, hi
+-	b.lo	L(move_long)
+-
+-	/* CASE: Copy Forwards
+-
+-	   Align src to 16 byte alignment so that we don't cross cache line
+-	   boundaries on both loads and stores.  There are at least 128 bytes
+-	   to copy, so copy 16 bytes unaligned and then align.  The loop
+-	   copies 32 bytes per iteration and prefetches one iteration ahead.  */
+-
+-	ldr	S_q, [src]
+-	and	tmp1, src, 15
+-	bic	src, src, 15
+-	sub	dst, dstin, tmp1
+-	add	count, count, tmp1	/* Count is now 16 too large.  */
+-	ldr	Q_q, [src, 16]!
+-	str	S_q, [dstin]
+-	ldr	S_q, [src, 16]!
+-	sub	count, count, 32 + 32 + 16	/* Test and readjust count.  */
+-
+-	.p2align 4
+-1:
+-	subs	count, count, 32
+-	str	Q_q, [dst, 16]
+-	ldr	Q_q, [src, 16]!
+-	str	S_q, [dst, 32]!
+-	ldr	S_q, [src, 16]!
+-	b.hi	1b
+-
+-	/* Copy 32 bytes from the end before writing the data prefetched in the
+-	   last loop iteration.  */
+-2:
+-	ldr	B_q, [srcend, -32]
+-	ldr	C_q, [srcend, -16]
+-	str	Q_q, [dst, 16]
+-	str	S_q, [dst, 32]
+-	str	B_q, [dstend, -32]
+-	str	C_q, [dstend, -16]
+-	ret
+-
+-	/* CASE: Copy Backwards
+-
+-	   Align srcend to 16 byte alignment so that we don't cross cache line
+-	   boundaries on both loads and stores.  There are at least 128 bytes
+-	   to copy, so copy 16 bytes unaligned and then align.  The loop
+-	   copies 32 bytes per iteration and prefetches one iteration ahead.  */
+-
+-	.p2align 4
+-	nop
+-	nop
+-L(move_long):
+-	cbz	tmp1, 3f  /* Return early if src == dstin */
+-	ldr	S_q, [srcend, -16]
+-	and	tmp1, srcend, 15
+-	sub	srcend, srcend, tmp1
+-	ldr	Q_q, [srcend, -16]!
+-	str	S_q, [dstend, -16]
+-	sub	count, count, tmp1
+-	ldr	S_q, [srcend, -16]!
+-	sub	dstend, dstend, tmp1
+-	sub	count, count, 32 + 32
+-
+-1:
+-	subs	count, count, 32
+-	str	Q_q, [dstend, -16]
+-	ldr	Q_q, [srcend, -16]!
+-	str	S_q, [dstend, -32]!
+-	ldr	S_q, [srcend, -16]!
+-	b.hi	1b
+-
+-	/* Copy 32 bytes from the start before writing the data prefetched in the
+-	   last loop iteration.  */
+-
+-	ldr	B_q, [src, 16]
+-	ldr	C_q, [src]
+-	str	Q_q, [dstend, -16]
+-	str	S_q, [dstend, -32]
+-	str	B_q, [dstin, 16]
+-	str	C_q, [dstin]
+-3:	ret
+-
+-END (__memmove_falkor)
+-libc_hidden_builtin_def (__memmove_falkor)
+-#endif
+diff --git a/sysdeps/aarch64/multiarch/memcpy_mops.S b/sysdeps/aarch64/multiarch/memcpy_mops.S
+new file mode 100644
+index 0000000000..4685629664
+--- /dev/null
++++ b/sysdeps/aarch64/multiarch/memcpy_mops.S
+@@ -0,0 +1,39 @@
++/* Optimized memcpy for MOPS.
++   Copyright (C) 2023 Free Software Foundation, Inc.
++
++   This file is part of the GNU C Library.
++
++   The GNU C Library is free software; you can redistribute it and/or
++   modify it under the terms of the GNU Lesser General Public
++   License as published by the Free Software Foundation; either
++   version 2.1 of the License, or (at your option) any later version.
++
++   The GNU C Library is distributed in the hope that it will be useful,
++   but WITHOUT ANY WARRANTY; without even the implied warranty of
++   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
++   Lesser General Public License for more details.
++
++   You should have received a copy of the GNU Lesser General Public
++   License along with the GNU C Library.  If not, see
++   <https://www.gnu.org/licenses/>.  */
++
++#include <sysdep.h>
++
++/* Assumptions:
++ *
++ * AArch64, MOPS.
++ *
++ */
++
++ENTRY (__memcpy_mops)
++	PTR_ARG (0)
++	PTR_ARG (1)
++	SIZE_ARG (2)
++
++	mov	x3, x0
++	.inst	0x19010443	/* cpyfp   [x3]!, [x1]!, x2!  */
++	.inst	0x19410443	/* cpyfm   [x3]!, [x1]!, x2!  */
++	.inst	0x19810443	/* cpyfe   [x3]!, [x1]!, x2!  */
++	ret
++
++END (__memcpy_mops)
+diff --git a/sysdeps/aarch64/multiarch/memcpy_sve.S b/sysdeps/aarch64/multiarch/memcpy_sve.S
+index a70907ec55..71d2f84f63 100644
+--- a/sysdeps/aarch64/multiarch/memcpy_sve.S
++++ b/sysdeps/aarch64/multiarch/memcpy_sve.S
+@@ -67,14 +67,15 @@ ENTRY (__memcpy_sve)
+ 
+ 	cmp	count, 128
+ 	b.hi	L(copy_long)
+-	cmp	count, 32
++	cntb	vlen
++	cmp	count, vlen, lsl 1
+ 	b.hi	L(copy32_128)
+-
+ 	whilelo p0.b, xzr, count
+-	cntb	vlen
+-	tbnz	vlen, 4, L(vlen128)
+-	ld1b	z0.b, p0/z, [src]
+-	st1b	z0.b, p0, [dstin]
++	whilelo p1.b, vlen, count
++	ld1b	z0.b, p0/z, [src, 0, mul vl]
++	ld1b	z1.b, p1/z, [src, 1, mul vl]
++	st1b	z0.b, p0, [dstin, 0, mul vl]
++	st1b	z1.b, p1, [dstin, 1, mul vl]
+ 	ret
+ 
+ 	/* Medium copies: 33..128 bytes.  */
+@@ -102,14 +103,6 @@ L(copy96):
+ 	stp	C_q, D_q, [dstend, -32]
+ 	ret
+ 
+-L(vlen128):
+-	whilelo p1.b, vlen, count
+-	ld1b	z0.b, p0/z, [src, 0, mul vl]
+-	ld1b	z1.b, p1/z, [src, 1, mul vl]
+-	st1b	z0.b, p0, [dstin, 0, mul vl]
+-	st1b	z1.b, p1, [dstin, 1, mul vl]
+-	ret
+-
+ 	.p2align 4
+ 	/* Copy more than 128 bytes.  */
+ L(copy_long):
+@@ -148,7 +141,6 @@ L(copy64_from_end):
+ 	ret
+ 
+ END (__memcpy_sve)
+-libc_hidden_builtin_def (__memcpy_sve)
+ 
+ 
+ ENTRY (__memmove_sve)
+@@ -158,14 +150,15 @@ ENTRY (__memmove_sve)
+ 
+ 	cmp	count, 128
+ 	b.hi	L(move_long)
+-	cmp	count, 32
++	cntb	vlen
++	cmp	count, vlen, lsl 1
+ 	b.hi	L(copy32_128)
+-
+ 	whilelo p0.b, xzr, count
+-	cntb	vlen
+-	tbnz	vlen, 4, L(vlen128)
+-	ld1b	z0.b, p0/z, [src]
+-	st1b	z0.b, p0, [dstin]
++	whilelo p1.b, vlen, count
++	ld1b	z0.b, p0/z, [src, 0, mul vl]
++	ld1b	z1.b, p1/z, [src, 1, mul vl]
++	st1b	z0.b, p0, [dstin, 0, mul vl]
++	st1b	z1.b, p1, [dstin, 1, mul vl]
+ 	ret
+ 
+ 	.p2align 4
+@@ -214,5 +207,4 @@ L(return):
+ 	ret
+ 
+ END (__memmove_sve)
+-libc_hidden_builtin_def (__memmove_sve)
+ #endif
+diff --git a/sysdeps/aarch64/multiarch/memcpy_thunderx.S b/sysdeps/aarch64/multiarch/memcpy_thunderx.S
+index 21e703dddd..2fb6be5c78 100644
+--- a/sysdeps/aarch64/multiarch/memcpy_thunderx.S
++++ b/sysdeps/aarch64/multiarch/memcpy_thunderx.S
+@@ -65,21 +65,7 @@
+    Overlapping large forward memmoves use a loop that copies backwards.
+ */
+ 
+-#ifndef MEMMOVE
+-# define MEMMOVE memmove
+-#endif
+-#ifndef MEMCPY
+-# define MEMCPY memcpy
+-#endif
+-
+-#if IS_IN (libc)
+-
+-#  undef MEMCPY
+-#  define MEMCPY __memcpy_thunderx
+-#  undef MEMMOVE
+-#  define MEMMOVE __memmove_thunderx
+-
+-ENTRY_ALIGN (MEMMOVE, 6)
++ENTRY (__memmove_thunderx)
+ 
+ 	PTR_ARG (0)
+ 	PTR_ARG (1)
+@@ -91,9 +77,9 @@ ENTRY_ALIGN (MEMMOVE, 6)
+ 	b.lo	L(move_long)
+ 
+ 	/* Common case falls through into memcpy.  */
+-END (MEMMOVE)
+-libc_hidden_builtin_def (MEMMOVE)
+-ENTRY (MEMCPY)
++END (__memmove_thunderx)
++
++ENTRY (__memcpy_thunderx)
+ 
+ 	PTR_ARG (0)
+ 	PTR_ARG (1)
+@@ -316,7 +302,4 @@ L(move_long):
+ 	stp	C_l, C_h, [dstin]
+ 3:	ret
+ 
+-END (MEMCPY)
+-libc_hidden_builtin_def (MEMCPY)
+-
+-#endif
++END (__memcpy_thunderx)
+diff --git a/sysdeps/aarch64/multiarch/memcpy_thunderx2.S b/sysdeps/aarch64/multiarch/memcpy_thunderx2.S
+index 5e0a59ee5d..3fceb1036d 100644
+--- a/sysdeps/aarch64/multiarch/memcpy_thunderx2.S
++++ b/sysdeps/aarch64/multiarch/memcpy_thunderx2.S
+@@ -75,27 +75,12 @@
+ #define I_v	v16
+ #define J_v	v17
+ 
+-#ifndef MEMMOVE
+-# define MEMMOVE memmove
+-#endif
+-#ifndef MEMCPY
+-# define MEMCPY memcpy
+-#endif
+-
+-#if IS_IN (libc)
+-
+-#undef MEMCPY
+-#define MEMCPY __memcpy_thunderx2
+-#undef MEMMOVE
+-#define MEMMOVE __memmove_thunderx2
+-
+-
+ /* Overlapping large forward memmoves use a loop that copies backwards.
+    Otherwise memcpy is used. Small moves branch to memcopy16 directly.
+    The longer memcpy cases fall through to the memcpy head.
+ */
+ 
+-ENTRY_ALIGN (MEMMOVE, 6)
++ENTRY (__memmove_thunderx2)
+ 
+ 	PTR_ARG (0)
+ 	PTR_ARG (1)
+@@ -109,8 +94,7 @@ ENTRY_ALIGN (MEMMOVE, 6)
+ 	ccmp	tmp1, count, 2, hi
+ 	b.lo	L(move_long)
+ 
+-END (MEMMOVE)
+-libc_hidden_builtin_def (MEMMOVE)
++END (__memmove_thunderx2)
+ 
+ 
+ /* Copies are split into 3 main cases: small copies of up to 16 bytes,
+@@ -124,8 +108,7 @@ libc_hidden_builtin_def (MEMMOVE)
+ 
+ #define MEMCPY_PREFETCH_LDR 640
+ 
+-	.p2align 4
+-ENTRY (MEMCPY)
++ENTRY (__memcpy_thunderx2)
+ 
+ 	PTR_ARG (0)
+ 	PTR_ARG (1)
+@@ -449,7 +432,7 @@ L(move_long):
+ 3:	ret
+ 
+ 
+-END (MEMCPY)
++END (__memcpy_thunderx2)
+ 	.section	.rodata
+ 	.p2align	4
+ 
+@@ -472,6 +455,3 @@ L(ext_table):
+ 	.word	L(ext_size_13) -.
+ 	.word	L(ext_size_14) -.
+ 	.word	L(ext_size_15) -.
+-
+-libc_hidden_builtin_def (MEMCPY)
+-#endif
+diff --git a/sysdeps/aarch64/multiarch/memmove.c b/sysdeps/aarch64/multiarch/memmove.c
+index 261996ecc4..fdcf418820 100644
+--- a/sysdeps/aarch64/multiarch/memmove.c
++++ b/sysdeps/aarch64/multiarch/memmove.c
+@@ -29,26 +29,25 @@
+ extern __typeof (__redirect_memmove) __libc_memmove;
+ 
+ extern __typeof (__redirect_memmove) __memmove_generic attribute_hidden;
+-extern __typeof (__redirect_memmove) __memmove_simd attribute_hidden;
+ extern __typeof (__redirect_memmove) __memmove_thunderx attribute_hidden;
+ extern __typeof (__redirect_memmove) __memmove_thunderx2 attribute_hidden;
+-extern __typeof (__redirect_memmove) __memmove_falkor attribute_hidden;
+ extern __typeof (__redirect_memmove) __memmove_a64fx attribute_hidden;
+ extern __typeof (__redirect_memmove) __memmove_sve attribute_hidden;
++extern __typeof (__redirect_memmove) __memmove_mops attribute_hidden;
+ 
+ static inline __typeof (__redirect_memmove) *
+ select_memmove_ifunc (void)
+ {
+   INIT_ARCH ();
+ 
+-  if (IS_NEOVERSE_N1 (midr) || IS_NEOVERSE_N2 (midr))
+-    return __memmove_simd;
++  if (mops)
++    return __memmove_mops;
+ 
+   if (sve && HAVE_AARCH64_SVE_ASM)
+     {
+       if (IS_A64FX (midr))
+ 	return __memmove_a64fx;
+-      return __memmove_sve;
++      return prefer_sve_ifuncs ? __memmove_sve : __memmove_generic;
+     }
+ 
+   if (IS_THUNDERX (midr))
+@@ -57,9 +56,6 @@ select_memmove_ifunc (void)
+   if (IS_THUNDERX2 (midr) || IS_THUNDERX2PA (midr))
+     return __memmove_thunderx2;
+ 
+-  if (IS_FALKOR (midr) || IS_PHECDA (midr))
+-    return __memmove_falkor;
+-
+   return __memmove_generic;
+ }
+ 
+diff --git a/sysdeps/aarch64/multiarch/memmove_mops.S b/sysdeps/aarch64/multiarch/memmove_mops.S
+new file mode 100644
+index 0000000000..c5ea66be3a
+--- /dev/null
++++ b/sysdeps/aarch64/multiarch/memmove_mops.S
+@@ -0,0 +1,39 @@
++/* Optimized memmove for MOPS.
++   Copyright (C) 2023 Free Software Foundation, Inc.
++
++   This file is part of the GNU C Library.
++
++   The GNU C Library is free software; you can redistribute it and/or
++   modify it under the terms of the GNU Lesser General Public
++   License as published by the Free Software Foundation; either
++   version 2.1 of the License, or (at your option) any later version.
++
++   The GNU C Library is distributed in the hope that it will be useful,
++   but WITHOUT ANY WARRANTY; without even the implied warranty of
++   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
++   Lesser General Public License for more details.
++
++   You should have received a copy of the GNU Lesser General Public
++   License along with the GNU C Library.  If not, see
++   <https://www.gnu.org/licenses/>.  */
++
++#include <sysdep.h>
++
++/* Assumptions:
++ *
++ * AArch64, MOPS.
++ *
++ */
++
++ENTRY (__memmove_mops)
++	PTR_ARG (0)
++	PTR_ARG (1)
++	SIZE_ARG (2)
++
++	mov	x3, x0
++	.inst	0x1d010443	/* cpyp    [x3]!, [x1]!, x2!  */
++	.inst	0x1d410443	/* cpym    [x3]!, [x1]!, x2!  */
++	.inst	0x1d810443	/* cpye    [x3]!, [x1]!, x2!  */
++	ret
++
++END (__memmove_mops)
+diff --git a/sysdeps/aarch64/multiarch/memset.c b/sysdeps/aarch64/multiarch/memset.c
+index c4008f346b..9ef9521fa6 100644
+--- a/sysdeps/aarch64/multiarch/memset.c
++++ b/sysdeps/aarch64/multiarch/memset.c
+@@ -28,28 +28,40 @@
+ 
+ extern __typeof (__redirect_memset) __libc_memset;
+ 
+-extern __typeof (__redirect_memset) __memset_falkor attribute_hidden;
++extern __typeof (__redirect_memset) __memset_zva64 attribute_hidden;
+ extern __typeof (__redirect_memset) __memset_emag attribute_hidden;
+ extern __typeof (__redirect_memset) __memset_kunpeng attribute_hidden;
+-# if HAVE_AARCH64_SVE_ASM
+ extern __typeof (__redirect_memset) __memset_a64fx attribute_hidden;
+-# endif
+ extern __typeof (__redirect_memset) __memset_generic attribute_hidden;
++extern __typeof (__redirect_memset) __memset_mops attribute_hidden;
+ 
+-libc_ifunc (__libc_memset,
+-	    IS_KUNPENG920 (midr)
+-	    ?__memset_kunpeng
+-	    : ((IS_FALKOR (midr) || IS_PHECDA (midr)) && zva_size == 64
+-	      ? __memset_falkor
+-	      : (IS_EMAG (midr) && zva_size == 64
+-		? __memset_emag
+-# if HAVE_AARCH64_SVE_ASM
+-		: (IS_A64FX (midr) && sve
+-		  ? __memset_a64fx
+-		  : __memset_generic))));
+-# else
+-		  : __memset_generic)));
+-# endif
++static inline __typeof (__redirect_memset) *
++select_memset_ifunc (void)
++{
++  INIT_ARCH ();
++
++  if (mops)
++    return __memset_mops;
++
++  if (sve && HAVE_AARCH64_SVE_ASM)
++    {
++      if (IS_A64FX (midr) && zva_size == 256)
++	return __memset_a64fx;
++    }
++
++  if (IS_KUNPENG920 (midr))
++    return __memset_kunpeng;
++
++  if (IS_EMAG (midr))
++    return __memset_emag;
++
++  if (zva_size == 64)
++    return __memset_zva64;
++
++  return __memset_generic;
++}
++
++libc_ifunc (__libc_memset, select_memset_ifunc ());
+ 
+ # undef memset
+ strong_alias (__libc_memset, memset);
+diff --git a/sysdeps/aarch64/multiarch/memset_a64fx.S b/sysdeps/aarch64/multiarch/memset_a64fx.S
+index dc87190724..4a4d4ed504 100644
+--- a/sysdeps/aarch64/multiarch/memset_a64fx.S
++++ b/sysdeps/aarch64/multiarch/memset_a64fx.S
+@@ -33,8 +33,6 @@
+ #define vector_length	x9
+ 
+ #if HAVE_AARCH64_SVE_ASM
+-# if IS_IN (libc)
+-#  define MEMSET __memset_a64fx
+ 
+ 	.arch armv8.2-a+sve
+ 
+@@ -49,7 +47,7 @@
+ #undef BTI_C
+ #define BTI_C
+ 
+-ENTRY (MEMSET)
++ENTRY (__memset_a64fx)
+ 	PTR_ARG (0)
+ 	SIZE_ARG (2)
+ 
+@@ -166,8 +164,6 @@ L(L2):
+ 	add	count, count, CACHE_LINE_SIZE
+ 	b	L(last)
+ 
+-END (MEMSET)
+-libc_hidden_builtin_def (MEMSET)
++END (__memset_a64fx)
+ 
+-#endif /* IS_IN (libc) */
+ #endif /* HAVE_AARCH64_SVE_ASM */
+diff --git a/sysdeps/aarch64/multiarch/memset_base64.S b/sysdeps/aarch64/multiarch/memset_base64.S
+deleted file mode 100644
+index 32d20d739e..0000000000
+--- a/sysdeps/aarch64/multiarch/memset_base64.S
++++ /dev/null
+@@ -1,186 +0,0 @@
+-/* Copyright (C) 2018-2022 Free Software Foundation, Inc.
+-
+-   This file is part of the GNU C Library.
+-
+-   The GNU C Library is free software; you can redistribute it and/or
+-   modify it under the terms of the GNU Lesser General Public
+-   License as published by the Free Software Foundation; either
+-   version 2.1 of the License, or (at your option) any later version.
+-
+-   The GNU C Library is distributed in the hope that it will be useful,
+-   but WITHOUT ANY WARRANTY; without even the implied warranty of
+-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+-   Lesser General Public License for more details.
+-
+-   You should have received a copy of the GNU Lesser General Public
+-   License along with the GNU C Library.  If not, see
+-   <https://www.gnu.org/licenses/>.  */
+-
+-#include <sysdep.h>
+-#include "memset-reg.h"
+-
+-#ifndef MEMSET
+-# define MEMSET __memset_base64
+-#endif
+-
+-/* To disable DC ZVA, set this threshold to 0. */
+-#ifndef DC_ZVA_THRESHOLD
+-# define DC_ZVA_THRESHOLD 512
+-#endif
+-
+-/* Assumptions:
+- *
+- * ARMv8-a, AArch64, unaligned accesses
+- *
+- */
+-
+-ENTRY_ALIGN (MEMSET, 6)
+-
+-	PTR_ARG (0)
+-	SIZE_ARG (2)
+-
+-	bfi	valw, valw, 8, 8
+-	bfi	valw, valw, 16, 16
+-	bfi	val, val, 32, 32
+-
+-	add	dstend, dstin, count
+-
+-	cmp	count, 96
+-	b.hi	L(set_long)
+-	cmp	count, 16
+-	b.hs	L(set_medium)
+-
+-	/* Set 0..15 bytes.  */
+-	tbz	count, 3, 1f
+-	str	val, [dstin]
+-	str	val, [dstend, -8]
+-	ret
+-
+-	.p2align 3
+-1:	tbz	count, 2, 2f
+-	str	valw, [dstin]
+-	str	valw, [dstend, -4]
+-	ret
+-2:	cbz	count, 3f
+-	strb	valw, [dstin]
+-	tbz	count, 1, 3f
+-	strh	valw, [dstend, -2]
+-3:	ret
+-
+-	.p2align 3
+-	/* Set 16..96 bytes.  */
+-L(set_medium):
+-	stp	val, val, [dstin]
+-	tbnz	count, 6, L(set96)
+-	stp	val, val, [dstend, -16]
+-	tbz	count, 5, 1f
+-	stp	val, val, [dstin, 16]
+-	stp	val, val, [dstend, -32]
+-1:	ret
+-
+-	.p2align 4
+-	/* Set 64..96 bytes.  Write 64 bytes from the start and
+-	   32 bytes from the end.  */
+-L(set96):
+-	stp	val, val, [dstin, 16]
+-	stp	val, val, [dstin, 32]
+-	stp	val, val, [dstin, 48]
+-	stp	val, val, [dstend, -32]
+-	stp	val, val, [dstend, -16]
+-	ret
+-
+-	.p2align 4
+-L(set_long):
+-	stp	val, val, [dstin]
+-	bic	dst, dstin, 15
+-#if DC_ZVA_THRESHOLD
+-	cmp	count, DC_ZVA_THRESHOLD
+-	ccmp	val, 0, 0, cs
+-	b.eq	L(zva_64)
+-#endif
+-	/* Small-size or non-zero memset does not use DC ZVA. */
+-	sub	count, dstend, dst
+-
+-	/*
+-	 * Adjust count and bias for loop. By substracting extra 1 from count,
+-	 * it is easy to use tbz instruction to check whether loop tailing
+-	 * count is less than 33 bytes, so as to bypass 2 unneccesary stps.
+-	 */
+-	sub	count, count, 64+16+1
+-
+-#if DC_ZVA_THRESHOLD
+-	/* Align loop on 16-byte boundary, this might be friendly to i-cache. */
+-	nop
+-#endif
+-
+-1:	stp	val, val, [dst, 16]
+-	stp	val, val, [dst, 32]
+-	stp	val, val, [dst, 48]
+-	stp	val, val, [dst, 64]!
+-	subs	count, count, 64
+-	b.hs	1b
+-
+-	tbz	count, 5, 1f	/* Remaining count is less than 33 bytes? */
+-	stp	val, val, [dst, 16]
+-	stp	val, val, [dst, 32]
+-1:	stp	val, val, [dstend, -32]
+-	stp	val, val, [dstend, -16]
+-	ret
+-
+-#if DC_ZVA_THRESHOLD
+-	.p2align 3
+-L(zva_64):
+-	stp	val, val, [dst, 16]
+-	stp	val, val, [dst, 32]
+-	stp	val, val, [dst, 48]
+-	bic	dst, dst, 63
+-
+-	/*
+-	 * Previous memory writes might cross cache line boundary, and cause
+-	 * cache line partially dirty. Zeroing this kind of cache line using
+-	 * DC ZVA will incur extra cost, for it requires loading untouched
+-	 * part of the line from memory before zeoring.
+-	 *
+-	 * So, write the first 64 byte aligned block using stp to force
+-	 * fully dirty cache line.
+-	 */
+-	stp	val, val, [dst, 64]
+-	stp	val, val, [dst, 80]
+-	stp	val, val, [dst, 96]
+-	stp	val, val, [dst, 112]
+-
+-	sub	count, dstend, dst
+-	/*
+-	 * Adjust count and bias for loop. By substracting extra 1 from count,
+-	 * it is easy to use tbz instruction to check whether loop tailing
+-	 * count is less than 33 bytes, so as to bypass 2 unneccesary stps.
+-	 */
+-	sub	count, count, 128+64+64+1
+-	add	dst, dst, 128
+-	nop
+-
+-	/* DC ZVA sets 64 bytes each time. */
+-1:	dc	zva, dst
+-	add	dst, dst, 64
+-	subs	count, count, 64
+-	b.hs	1b
+-
+-	/*
+-	 * Write the last 64 byte aligned block using stp to force fully
+-	 * dirty cache line.
+-	 */
+-	stp	val, val, [dst, 0]
+-	stp	val, val, [dst, 16]
+-	stp	val, val, [dst, 32]
+-	stp	val, val, [dst, 48]
+-
+-	tbz	count, 5, 1f	/* Remaining count is less than 33 bytes? */
+-	stp	val, val, [dst, 64]
+-	stp	val, val, [dst, 80]
+-1:	stp	val, val, [dstend, -32]
+-	stp	val, val, [dstend, -16]
+-	ret
+-#endif
+-
+-END (MEMSET)
+-libc_hidden_builtin_def (MEMSET)
+diff --git a/sysdeps/aarch64/multiarch/memset_emag.S b/sysdeps/aarch64/multiarch/memset_emag.S
+index 922c1ed57d..7ecf61dc59 100644
+--- a/sysdeps/aarch64/multiarch/memset_emag.S
++++ b/sysdeps/aarch64/multiarch/memset_emag.S
+@@ -18,19 +18,95 @@
+    <https://www.gnu.org/licenses/>.  */
+ 
+ #include <sysdep.h>
++#include "memset-reg.h"
+ 
+-#if IS_IN (libc)
+-# define MEMSET __memset_emag
+-
+-/*
+- * Using DC ZVA to zero memory does not produce better performance if
+- * memory size is not very large, especially when there are multiple
+- * processes/threads contending memory/cache. Here we set threshold to
+- * zero to disable using DC ZVA, which is good for multi-process/thread
+- * workloads.
++/* Assumptions:
++ *
++ * ARMv8-a, AArch64, unaligned accesses
++ *
+  */
+ 
+-# define DC_ZVA_THRESHOLD 0
++ENTRY (__memset_emag)
++
++	PTR_ARG (0)
++	SIZE_ARG (2)
++
++	bfi	valw, valw, 8, 8
++	bfi	valw, valw, 16, 16
++	bfi	val, val, 32, 32
++
++	add	dstend, dstin, count
++
++	cmp	count, 96
++	b.hi	L(set_long)
++	cmp	count, 16
++	b.hs	L(set_medium)
++
++	/* Set 0..15 bytes.  */
++	tbz	count, 3, 1f
++	str	val, [dstin]
++	str	val, [dstend, -8]
++	ret
++
++	.p2align 3
++1:	tbz	count, 2, 2f
++	str	valw, [dstin]
++	str	valw, [dstend, -4]
++	ret
++2:	cbz	count, 3f
++	strb	valw, [dstin]
++	tbz	count, 1, 3f
++	strh	valw, [dstend, -2]
++3:	ret
++
++	.p2align 3
++	/* Set 16..96 bytes.  */
++L(set_medium):
++	stp	val, val, [dstin]
++	tbnz	count, 6, L(set96)
++	stp	val, val, [dstend, -16]
++	tbz	count, 5, 1f
++	stp	val, val, [dstin, 16]
++	stp	val, val, [dstend, -32]
++1:	ret
++
++	.p2align 4
++	/* Set 64..96 bytes.  Write 64 bytes from the start and
++	   32 bytes from the end.  */
++L(set96):
++	stp	val, val, [dstin, 16]
++	stp	val, val, [dstin, 32]
++	stp	val, val, [dstin, 48]
++	stp	val, val, [dstend, -32]
++	stp	val, val, [dstend, -16]
++	ret
++
++	.p2align 4
++L(set_long):
++	stp	val, val, [dstin]
++	bic	dst, dstin, 15
++	/* Small-size or non-zero memset does not use DC ZVA. */
++	sub	count, dstend, dst
++
++	/*
++	 * Adjust count and bias for loop. By subtracting extra 1 from count,
++	 * it is easy to use tbz instruction to check whether loop tailing
++	 * count is less than 33 bytes, so as to bypass 2 unnecessary stps.
++	 */
++	sub	count, count, 64+16+1
++
++1:	stp	val, val, [dst, 16]
++	stp	val, val, [dst, 32]
++	stp	val, val, [dst, 48]
++	stp	val, val, [dst, 64]!
++	subs	count, count, 64
++	b.hs	1b
++
++	tbz	count, 5, 1f	/* Remaining count is less than 33 bytes? */
++	stp	val, val, [dst, 16]
++	stp	val, val, [dst, 32]
++1:	stp	val, val, [dstend, -32]
++	stp	val, val, [dstend, -16]
++	ret
+ 
+-# include "./memset_base64.S"
+-#endif
++END (__memset_emag)
+diff --git a/sysdeps/aarch64/multiarch/memset_falkor.S b/sysdeps/aarch64/multiarch/memset_falkor.S
+deleted file mode 100644
+index 657f4c60b4..0000000000
+--- a/sysdeps/aarch64/multiarch/memset_falkor.S
++++ /dev/null
+@@ -1,54 +0,0 @@
+-/* Memset for falkor.
+-   Copyright (C) 2017-2022 Free Software Foundation, Inc.
+-
+-   This file is part of the GNU C Library.
+-
+-   The GNU C Library is free software; you can redistribute it and/or
+-   modify it under the terms of the GNU Lesser General Public
+-   License as published by the Free Software Foundation; either
+-   version 2.1 of the License, or (at your option) any later version.
+-
+-   The GNU C Library is distributed in the hope that it will be useful,
+-   but WITHOUT ANY WARRANTY; without even the implied warranty of
+-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+-   Lesser General Public License for more details.
+-
+-   You should have received a copy of the GNU Lesser General Public
+-   License along with the GNU C Library.  If not, see
+-   <https://www.gnu.org/licenses/>.  */
+-
+-#include <sysdep.h>
+-#include <memset-reg.h>
+-
+-/* Reading dczid_el0 is expensive on falkor so move it into the ifunc
+-   resolver and assume ZVA size of 64 bytes.  The IFUNC resolver takes care to
+-   use this function only when ZVA is enabled.  */
+-
+-#if IS_IN (libc)
+-.macro zva_macro
+-	.p2align 4
+-	/* Write the first and last 64 byte aligned block using stp rather
+-	   than using DC ZVA.  This is faster on some cores.  */
+-	str	q0, [dst, 16]
+-	stp	q0, q0, [dst, 32]
+-	bic	dst, dst, 63
+-	stp	q0, q0, [dst, 64]
+-	stp	q0, q0, [dst, 96]
+-	sub	count, dstend, dst	/* Count is now 128 too large.	*/
+-	sub	count, count, 128+64+64	/* Adjust count and bias for loop.  */
+-	add	dst, dst, 128
+-1:	dc	zva, dst
+-	add	dst, dst, 64
+-	subs	count, count, 64
+-	b.hi	1b
+-	stp	q0, q0, [dst, 0]
+-	stp	q0, q0, [dst, 32]
+-	stp	q0, q0, [dstend, -64]
+-	stp	q0, q0, [dstend, -32]
+-	ret
+-.endm
+-
+-# define ZVA_MACRO zva_macro
+-# define MEMSET __memset_falkor
+-# include <sysdeps/aarch64/memset.S>
+-#endif
+diff --git a/sysdeps/aarch64/multiarch/memset_generic.S b/sysdeps/aarch64/multiarch/memset_generic.S
+index c879be93d5..6efcb5f00d 100644
+--- a/sysdeps/aarch64/multiarch/memset_generic.S
++++ b/sysdeps/aarch64/multiarch/memset_generic.S
+@@ -21,9 +21,15 @@
+ 
+ #if IS_IN (libc)
+ # define MEMSET __memset_generic
++
++/* Do not hide the generic version of memset, we use it internally.  */
++# undef libc_hidden_builtin_def
++# define libc_hidden_builtin_def(name)
++
+ /* Add a hidden definition for use within libc.so.  */
+ # ifdef SHARED
+ 	.globl __GI_memset; __GI_memset = __memset_generic
+ # endif
+-# include <sysdeps/aarch64/memset.S>
+ #endif
++
++#include <../memset.S>
+diff --git a/sysdeps/aarch64/multiarch/memset_kunpeng.S b/sysdeps/aarch64/multiarch/memset_kunpeng.S
+index a6d2c8c3bb..8f2deddb74 100644
+--- a/sysdeps/aarch64/multiarch/memset_kunpeng.S
++++ b/sysdeps/aarch64/multiarch/memset_kunpeng.S
+@@ -20,16 +20,13 @@
+ #include <sysdep.h>
+ #include <sysdeps/aarch64/memset-reg.h>
+ 
+-#if IS_IN (libc)
+-# define MEMSET __memset_kunpeng
+-
+ /* Assumptions:
+  *
+  * ARMv8-a, AArch64, unaligned accesses
+  *
+  */
+ 
+-ENTRY_ALIGN (MEMSET, 6)
++ENTRY (__memset_kunpeng)
+ 
+ 	PTR_ARG (0)
+ 	SIZE_ARG (2)
+@@ -108,6 +105,4 @@ L(set_long):
+ 	stp	q0, q0, [dstend, -32]
+ 	ret
+ 
+-END (MEMSET)
+-libc_hidden_builtin_def (MEMSET)
+-#endif
++END (__memset_kunpeng)
+diff --git a/sysdeps/aarch64/multiarch/memset_mops.S b/sysdeps/aarch64/multiarch/memset_mops.S
+new file mode 100644
+index 0000000000..ca820b8636
+--- /dev/null
++++ b/sysdeps/aarch64/multiarch/memset_mops.S
+@@ -0,0 +1,38 @@
++/* Optimized memset for MOPS.
++   Copyright (C) 2023 Free Software Foundation, Inc.
++
++   This file is part of the GNU C Library.
++
++   The GNU C Library is free software; you can redistribute it and/or
++   modify it under the terms of the GNU Lesser General Public
++   License as published by the Free Software Foundation; either
++   version 2.1 of the License, or (at your option) any later version.
++
++   The GNU C Library is distributed in the hope that it will be useful,
++   but WITHOUT ANY WARRANTY; without even the implied warranty of
++   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
++   Lesser General Public License for more details.
++
++   You should have received a copy of the GNU Lesser General Public
++   License along with the GNU C Library.  If not, see
++   <https://www.gnu.org/licenses/>.  */
++
++#include <sysdep.h>
++
++/* Assumptions:
++ *
++ * AArch64, MOPS.
++ *
++ */
++
++ENTRY (__memset_mops)
++	PTR_ARG (0)
++	SIZE_ARG (2)
++
++	mov     x3, x0
++	.inst   0x19c10443	/* setp    [x3]!, x2!, x1  */
++	.inst   0x19c14443	/* setm    [x3]!, x2!, x1  */
++	.inst   0x19c18443	/* sete    [x3]!, x2!, x1  */
++	ret
++
++END (__memset_mops)
+diff --git a/sysdeps/aarch64/multiarch/memset_zva64.S b/sysdeps/aarch64/multiarch/memset_zva64.S
+new file mode 100644
+index 0000000000..13f45fd3d8
+--- /dev/null
++++ b/sysdeps/aarch64/multiarch/memset_zva64.S
+@@ -0,0 +1,27 @@
++/* Optimized memset for zva size = 64.
++   Copyright (C) 2023 Free Software Foundation, Inc.
++
++   This file is part of the GNU C Library.
++
++   The GNU C Library is free software; you can redistribute it and/or
++   modify it under the terms of the GNU Lesser General Public
++   License as published by the Free Software Foundation; either
++   version 2.1 of the License, or (at your option) any later version.
++
++   The GNU C Library is distributed in the hope that it will be useful,
++   but WITHOUT ANY WARRANTY; without even the implied warranty of
++   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
++   Lesser General Public License for more details.
++
++   You should have received a copy of the GNU Lesser General Public
++   License along with the GNU C Library.  If not, see
++   <https://www.gnu.org/licenses/>.  */
++
++#include <sysdep.h>
++
++#define ZVA64_ONLY 1
++#define MEMSET __memset_zva64
++#undef libc_hidden_builtin_def
++#define libc_hidden_builtin_def(X)
++
++#include "../memset.S"
+diff --git a/sysdeps/aarch64/multiarch/rtld-memset.S b/sysdeps/aarch64/multiarch/rtld-memset.S
+deleted file mode 100644
+index 7968d25e48..0000000000
+--- a/sysdeps/aarch64/multiarch/rtld-memset.S
++++ /dev/null
+@@ -1,25 +0,0 @@
+-/* Memset for aarch64, for the dynamic linker.
+-   Copyright (C) 2017-2022 Free Software Foundation, Inc.
+-
+-   This file is part of the GNU C Library.
+-
+-   The GNU C Library is free software; you can redistribute it and/or
+-   modify it under the terms of the GNU Lesser General Public
+-   License as published by the Free Software Foundation; either
+-   version 2.1 of the License, or (at your option) any later version.
+-
+-   The GNU C Library is distributed in the hope that it will be useful,
+-   but WITHOUT ANY WARRANTY; without even the implied warranty of
+-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+-   Lesser General Public License for more details.
+-
+-   You should have received a copy of the GNU Lesser General Public
+-   License along with the GNU C Library.  If not, see
+-   <https://www.gnu.org/licenses/>.  */
+-
+-#include <sysdep.h>
+-
+-#if IS_IN (rtld)
+-# define MEMSET memset
+-# include <sysdeps/aarch64/memset.S>
+-#endif
+diff --git a/sysdeps/aarch64/multiarch/strlen.c b/sysdeps/aarch64/multiarch/strlen.c
+index 6d27c126b0..a951967fcd 100644
+--- a/sysdeps/aarch64/multiarch/strlen.c
++++ b/sysdeps/aarch64/multiarch/strlen.c
+@@ -28,10 +28,10 @@
+ 
+ extern __typeof (__redirect_strlen) __strlen;
+ 
+-extern __typeof (__redirect_strlen) __strlen_mte attribute_hidden;
++extern __typeof (__redirect_strlen) __strlen_generic attribute_hidden;
+ extern __typeof (__redirect_strlen) __strlen_asimd attribute_hidden;
+ 
+-libc_ifunc (__strlen, (mte ? __strlen_mte : __strlen_asimd));
++libc_ifunc (__strlen, (mte ? __strlen_generic : __strlen_asimd));
+ 
+ # undef strlen
+ strong_alias (__strlen, strlen);
+diff --git a/sysdeps/aarch64/multiarch/strlen_asimd.S b/sysdeps/aarch64/multiarch/strlen_asimd.S
+index 6faeb91361..dcd4589d10 100644
+--- a/sysdeps/aarch64/multiarch/strlen_asimd.S
++++ b/sysdeps/aarch64/multiarch/strlen_asimd.S
+@@ -48,6 +48,7 @@
+ #define tmp	x2
+ #define tmpw	w2
+ #define synd	x3
++#define syndw	w3
+ #define shift	x4
+ 
+ /* For the first 32 bytes, NUL detection works on the principle that
+@@ -87,7 +88,6 @@
+ 
+ ENTRY (__strlen_asimd)
+ 	PTR_ARG (0)
+-
+ 	and	tmp1, srcin, MIN_PAGE_SIZE - 1
+ 	cmp	tmp1, MIN_PAGE_SIZE - 32
+ 	b.hi	L(page_cross)
+@@ -123,7 +123,6 @@ ENTRY (__strlen_asimd)
+ 	add	len, len, tmp1, lsr 3
+ 	ret
+ 
+-	.p2align 3
+ 	/* Look for a NUL byte at offset 16..31 in the string.  */
+ L(bytes16_31):
+ 	ldp	data1, data2, [srcin, 16]
+@@ -151,6 +150,7 @@ L(bytes16_31):
+ 	add	len, len, tmp1, lsr 3
+ 	ret
+ 
++	nop
+ L(loop_entry):
+ 	bic	src, srcin, 31
+ 
+@@ -166,18 +166,12 @@ L(loop):
+ 	/* Low 32 bits of synd are non-zero if a NUL was found in datav1.  */
+ 	cmeq	maskv.16b, datav1.16b, 0
+ 	sub	len, src, srcin
+-	tst	synd, 0xffffffff
+-	b.ne	1f
++	cbnz	syndw, 1f
+ 	cmeq	maskv.16b, datav2.16b, 0
+ 	add	len, len, 16
+ 1:
+ 	/* Generate a bitmask and compute correct byte offset.  */
+-#ifdef __AARCH64EB__
+-	bic	maskv.8h, 0xf0
+-#else
+-	bic	maskv.8h, 0x0f, lsl 8
+-#endif
+-	umaxp	maskv.16b, maskv.16b, maskv.16b
++	shrn	maskv.8b, maskv.8h, 4
+ 	fmov	synd, maskd
+ #ifndef __AARCH64EB__
+ 	rbit	synd, synd
+@@ -186,8 +180,6 @@ L(loop):
+ 	add	len, len, tmp, lsr 2
+ 	ret
+ 
+-        .p2align 4
+-
+ L(page_cross):
+ 	bic	src, srcin, 31
+ 	mov	tmpw, 0x0c03
+@@ -211,4 +203,3 @@ L(page_cross):
+ 	ret
+ 
+ END (__strlen_asimd)
+-libc_hidden_builtin_def (__strlen_asimd)
+diff --git a/sysdeps/aarch64/multiarch/strlen_generic.S b/sysdeps/aarch64/multiarch/strlen_generic.S
+new file mode 100644
+index 0000000000..014e376ec1
+--- /dev/null
++++ b/sysdeps/aarch64/multiarch/strlen_generic.S
+@@ -0,0 +1,39 @@
++/* A Generic Optimized strlen implementation for AARCH64.
++   Copyright (C) 2018-2022 Free Software Foundation, Inc.
++   This file is part of the GNU C Library.
++
++   The GNU C Library is free software; you can redistribute it and/or
++   modify it under the terms of the GNU Lesser General Public
++   License as published by the Free Software Foundation; either
++   version 2.1 of the License, or (at your option) any later version.
++
++   The GNU C Library is distributed in the hope that it will be useful,
++   but WITHOUT ANY WARRANTY; without even the implied warranty of
++   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
++   Lesser General Public License for more details.
++
++   You should have received a copy of the GNU Lesser General Public
++   License along with the GNU C Library; if not, see
++   <https://www.gnu.org/licenses/>.  */
++
++/* The actual strlen code is in ../strlen.S.  If we are building libc this file
++   defines __strlen_generic.  Otherwise the include of ../strlen.S will define
++   the normal __strlen entry points.  */
++
++#include <sysdep.h>
++
++#if IS_IN (libc)
++
++# define STRLEN __strlen_generic
++
++/* Do not hide the generic version of strlen, we use it internally.  */
++# undef libc_hidden_builtin_def
++# define libc_hidden_builtin_def(name)
++
++# ifdef SHARED
++/* It doesn't make sense to send libc-internal strlen calls through a PLT. */
++	.globl __GI_strlen; __GI_strlen = __strlen_generic
++# endif
++#endif
++
++#include "../strlen.S"
+diff --git a/sysdeps/aarch64/multiarch/strlen_mte.S b/sysdeps/aarch64/multiarch/strlen_mte.S
+deleted file mode 100644
+index bf03ac53eb..0000000000
+--- a/sysdeps/aarch64/multiarch/strlen_mte.S
++++ /dev/null
+@@ -1,39 +0,0 @@
+-/* A Generic Optimized strlen implementation for AARCH64.
+-   Copyright (C) 2018-2022 Free Software Foundation, Inc.
+-   This file is part of the GNU C Library.
+-
+-   The GNU C Library is free software; you can redistribute it and/or
+-   modify it under the terms of the GNU Lesser General Public
+-   License as published by the Free Software Foundation; either
+-   version 2.1 of the License, or (at your option) any later version.
+-
+-   The GNU C Library is distributed in the hope that it will be useful,
+-   but WITHOUT ANY WARRANTY; without even the implied warranty of
+-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+-   Lesser General Public License for more details.
+-
+-   You should have received a copy of the GNU Lesser General Public
+-   License along with the GNU C Library; if not, see
+-   <https://www.gnu.org/licenses/>.  */
+-
+-/* The actual strlen code is in ../strlen.S.  If we are building libc this file
+-   defines __strlen_mte.  Otherwise the include of ../strlen.S will define
+-   the normal __strlen  entry points.  */
+-
+-#include <sysdep.h>
+-
+-#if IS_IN (libc)
+-
+-# define STRLEN __strlen_mte
+-
+-/* Do not hide the generic version of strlen, we use it internally.  */
+-# undef libc_hidden_builtin_def
+-# define libc_hidden_builtin_def(name)
+-
+-# ifdef SHARED
+-/* It doesn't make sense to send libc-internal strlen calls through a PLT. */
+-	.globl __GI_strlen; __GI_strlen = __strlen_mte
+-# endif
+-#endif
+-
+-#include "../strlen.S"
+diff --git a/sysdeps/aarch64/rawmemchr.S b/sysdeps/aarch64/rawmemchr.S
+index 55d9e34d4f..f90ce2bf86 100644
+--- a/sysdeps/aarch64/rawmemchr.S
++++ b/sysdeps/aarch64/rawmemchr.S
+@@ -31,7 +31,7 @@ ENTRY (__rawmemchr)
+ 
+ L(do_strlen):
+ 	mov	x15, x30
+-	cfi_return_column (x15)
++	cfi_register (x30, x15)
+ 	mov	x14, x0
+ 	bl	__strlen
+ 	add	x0, x14, x0
+diff --git a/sysdeps/aarch64/strchr.S b/sysdeps/aarch64/strchr.S
+index 003bf4a478..4781d45bd9 100644
+--- a/sysdeps/aarch64/strchr.S
++++ b/sysdeps/aarch64/strchr.S
+@@ -32,8 +32,7 @@
+ 
+ #define src		x2
+ #define tmp1		x1
+-#define wtmp2		w3
+-#define tmp3		x3
++#define tmp2		x3
+ 
+ #define vrepchr		v0
+ #define vdata		v1
+@@ -41,39 +40,30 @@
+ #define vhas_nul	v2
+ #define vhas_chr	v3
+ #define vrepmask	v4
+-#define vrepmask2	v5
+-#define vend		v6
+-#define dend		d6
++#define vend		v5
++#define dend		d5
+ 
+ /* Core algorithm.
+ 
+    For each 16-byte chunk we calculate a 64-bit syndrome value with four bits
+-   per byte. For even bytes, bits 0-1 are set if the relevant byte matched the
+-   requested character, bits 2-3 are set if the byte is NUL (or matched), and
+-   bits 4-7 are not used and must be zero if none of bits 0-3 are set). Odd
+-   bytes set bits 4-7 so that adjacent bytes can be merged. Since the bits
+-   in the syndrome reflect the order in which things occur in the original
+-   string, counting trailing zeros identifies exactly which byte matched.  */
++   per byte. Bits 0-1 are set if the relevant byte matched the requested
++   character, bits 2-3 are set if the byte is NUL or matched. Count trailing
++   zeroes gives the position of the matching byte if it is a multiple of 4.
++   If it is not a multiple of 4, there was no match.  */
+ 
+ ENTRY (strchr)
+ 	PTR_ARG (0)
+ 	bic	src, srcin, 15
+ 	dup	vrepchr.16b, chrin
+ 	ld1	{vdata.16b}, [src]
+-	mov	wtmp2, 0x3003
+-	dup	vrepmask.8h, wtmp2
++	movi	vrepmask.16b, 0x33
+ 	cmeq	vhas_nul.16b, vdata.16b, 0
+ 	cmeq	vhas_chr.16b, vdata.16b, vrepchr.16b
+-	mov	wtmp2, 0xf00f
+-	dup	vrepmask2.8h, wtmp2
+-
+ 	bit	vhas_nul.16b, vhas_chr.16b, vrepmask.16b
+-	and	vhas_nul.16b, vhas_nul.16b, vrepmask2.16b
+-	lsl	tmp3, srcin, 2
+-	addp	vend.16b, vhas_nul.16b, vhas_nul.16b		/* 128->64 */
+-
++	lsl	tmp2, srcin, 2
++	shrn	vend.8b, vhas_nul.8h, 4		/* 128->64 */
+ 	fmov	tmp1, dend
+-	lsr	tmp1, tmp1, tmp3
++	lsr	tmp1, tmp1, tmp2
+ 	cbz	tmp1, L(loop)
+ 
+ 	rbit	tmp1, tmp1
+@@ -87,28 +77,34 @@ ENTRY (strchr)
+ 
+ 	.p2align 4
+ L(loop):
+-	ldr	qdata, [src, 16]!
++	ldr	qdata, [src, 16]
++	cmeq	vhas_chr.16b, vdata.16b, vrepchr.16b
++	cmhs	vhas_nul.16b, vhas_chr.16b, vdata.16b
++	umaxp	vend.16b, vhas_nul.16b, vhas_nul.16b
++	fmov	tmp1, dend
++	cbnz	tmp1, L(end)
++	ldr	qdata, [src, 32]!
+ 	cmeq	vhas_chr.16b, vdata.16b, vrepchr.16b
+ 	cmhs	vhas_nul.16b, vhas_chr.16b, vdata.16b
+ 	umaxp	vend.16b, vhas_nul.16b, vhas_nul.16b
+ 	fmov	tmp1, dend
+ 	cbz	tmp1, L(loop)
++	sub	src, src, 16
++L(end):
+ 
+ #ifdef __AARCH64EB__
+ 	bif	vhas_nul.16b, vhas_chr.16b, vrepmask.16b
+-	and	vhas_nul.16b, vhas_nul.16b, vrepmask2.16b
+-	addp	vend.16b, vhas_nul.16b, vhas_nul.16b		/* 128->64 */
++	shrn	vend.8b, vhas_nul.8h, 4		/* 128->64 */
+ 	fmov	tmp1, dend
+ #else
+ 	bit	vhas_nul.16b, vhas_chr.16b, vrepmask.16b
+-	and	vhas_nul.16b, vhas_nul.16b, vrepmask2.16b
+-	addp	vend.16b, vhas_nul.16b, vhas_nul.16b		/* 128->64 */
++	shrn	vend.8b, vhas_nul.8h, 4		/* 128->64 */
+ 	fmov	tmp1, dend
+ 	rbit	tmp1, tmp1
+ #endif
++	add	src, src, 16
+ 	clz	tmp1, tmp1
+-	/* Tmp1 is an even multiple of 2 if the target character was
+-	   found first. Otherwise we've found the end of string.  */
++	/* Tmp1 is a multiple of 4 if the target character was found.  */
+ 	tst	tmp1, 2
+ 	add	result, src, tmp1, lsr 2
+ 	csel	result, result, xzr, eq
+diff --git a/sysdeps/aarch64/strchrnul.S b/sysdeps/aarch64/strchrnul.S
+index ee154ab74b..94465fc088 100644
+--- a/sysdeps/aarch64/strchrnul.S
++++ b/sysdeps/aarch64/strchrnul.S
+@@ -70,14 +70,22 @@ ENTRY (__strchrnul)
+ 
+ 	.p2align 4
+ L(loop):
+-	ldr	qdata, [src, 16]!
++	ldr	qdata, [src, 16]
++	cmeq	vhas_chr.16b, vdata.16b, vrepchr.16b
++	cmhs	vhas_chr.16b, vhas_chr.16b, vdata.16b
++	umaxp	vend.16b, vhas_chr.16b, vhas_chr.16b
++	fmov	tmp1, dend
++	cbnz	tmp1, L(end)
++	ldr	qdata, [src, 32]!
+ 	cmeq	vhas_chr.16b, vdata.16b, vrepchr.16b
+ 	cmhs	vhas_chr.16b, vhas_chr.16b, vdata.16b
+ 	umaxp	vend.16b, vhas_chr.16b, vhas_chr.16b
+ 	fmov	tmp1, dend
+ 	cbz	tmp1, L(loop)
+-
++	sub	src, src, 16
++L(end):
+ 	shrn	vend.8b, vhas_chr.8h, 4		/* 128->64 */
++	add	src, src, 16
+ 	fmov	tmp1, dend
+ #ifndef __AARCH64EB__
+ 	rbit	tmp1, tmp1
+diff --git a/sysdeps/aarch64/strcpy.S b/sysdeps/aarch64/strcpy.S
+index 78d27b4aa6..6eeda12df6 100644
+--- a/sysdeps/aarch64/strcpy.S
++++ b/sysdeps/aarch64/strcpy.S
+@@ -30,7 +30,6 @@
+  * MTE compatible.
+  */
+ 
+-/* Arguments and results.  */
+ #define dstin		x0
+ #define srcin		x1
+ #define result		x0
+@@ -76,14 +75,14 @@ ENTRY (STRCPY)
+ 	ld1	{vdata.16b}, [src]
+ 	cmeq	vhas_nul.16b, vdata.16b, 0
+ 	lsl	shift, srcin, 2
+-	shrn	vend.8b, vhas_nul.8h, 4		/* 128->64 */
++	shrn	vend.8b, vhas_nul.8h, 4
+ 	fmov	synd, dend
+ 	lsr	synd, synd, shift
+ 	cbnz	synd, L(tail)
+ 
+ 	ldr	dataq, [src, 16]!
+ 	cmeq	vhas_nul.16b, vdata.16b, 0
+-	shrn	vend.8b, vhas_nul.8h, 4		/* 128->64 */
++	shrn	vend.8b, vhas_nul.8h, 4
+ 	fmov	synd, dend
+ 	cbz	synd, L(start_loop)
+ 
+@@ -102,13 +101,10 @@ ENTRY (STRCPY)
+ 	IFSTPCPY (add result, dstin, len)
+ 	ret
+ 
+-	.p2align 4,,8
+ L(tail):
+ 	rbit	synd, synd
+ 	clz	len, synd
+ 	lsr	len, len, 2
+-
+-	.p2align 4
+ L(less16):
+ 	tbz	len, 3, L(less8)
+ 	sub	tmp, len, 7
+@@ -141,31 +137,37 @@ L(zerobyte):
+ 
+ 	.p2align 4
+ L(start_loop):
+-	sub	len, src, srcin
++	sub	tmp, srcin, dstin
+ 	ldr	dataq2, [srcin]
+-	add	dst, dstin, len
++	sub	dst, src, tmp
+ 	str	dataq2, [dstin]
+-
+-	.p2align 5
+ L(loop):
+-	str	dataq, [dst], 16
+-	ldr	dataq, [src, 16]!
++	str	dataq, [dst], 32
++	ldr	dataq, [src, 16]
++	cmeq	vhas_nul.16b, vdata.16b, 0
++	umaxp	vend.16b, vhas_nul.16b, vhas_nul.16b
++	fmov	synd, dend
++	cbnz	synd, L(loopend)
++	str	dataq, [dst, -16]
++	ldr	dataq, [src, 32]!
+ 	cmeq	vhas_nul.16b, vdata.16b, 0
+ 	umaxp	vend.16b, vhas_nul.16b, vhas_nul.16b
+ 	fmov	synd, dend
+ 	cbz	synd, L(loop)
+-
++	add	dst, dst, 16
++L(loopend):
+ 	shrn	vend.8b, vhas_nul.8h, 4		/* 128->64 */
+ 	fmov	synd, dend
++	sub	dst, dst, 31
+ #ifndef __AARCH64EB__
+ 	rbit	synd, synd
+ #endif
+ 	clz	len, synd
+ 	lsr	len, len, 2
+-	sub	tmp, len, 15
+-	ldr	dataq, [src, tmp]
+-	str	dataq, [dst, tmp]
+-	IFSTPCPY (add result, dst, len)
++	add	dst, dst, len
++	ldr	dataq, [dst, tmp]
++	str	dataq, [dst]
++	IFSTPCPY (add result, dst, 15)
+ 	ret
+ 
+ END (STRCPY)
+diff --git a/sysdeps/aarch64/strlen.S b/sysdeps/aarch64/strlen.S
+index 3a5d088407..10b9ec0769 100644
+--- a/sysdeps/aarch64/strlen.S
++++ b/sysdeps/aarch64/strlen.S
+@@ -43,12 +43,9 @@
+ #define dend		d2
+ 
+ /* Core algorithm:
+-
+-   For each 16-byte chunk we calculate a 64-bit nibble mask value with four bits
+-   per byte. We take 4 bits of every comparison byte with shift right and narrow
+-   by 4 instruction. Since the bits in the nibble mask reflect the order in
+-   which things occur in the original string, counting trailing zeros identifies
+-   exactly which byte matched.  */
++   Process the string in 16-byte aligned chunks. Compute a 64-bit mask with
++   four bits per byte using the shrn instruction. A count trailing zeros then
++   identifies the first zero byte.  */
+ 
+ ENTRY (STRLEN)
+ 	PTR_ARG (0)
+@@ -68,18 +65,25 @@ ENTRY (STRLEN)
+ 
+ 	.p2align 5
+ L(loop):
+-	ldr	data, [src, 16]!
++	ldr	data, [src, 16]
++	cmeq	vhas_nul.16b, vdata.16b, 0
++	umaxp	vend.16b, vhas_nul.16b, vhas_nul.16b
++	fmov	synd, dend
++	cbnz	synd, L(loop_end)
++	ldr	data, [src, 32]!
+ 	cmeq	vhas_nul.16b, vdata.16b, 0
+ 	umaxp	vend.16b, vhas_nul.16b, vhas_nul.16b
+ 	fmov	synd, dend
+ 	cbz	synd, L(loop)
+-
++	sub	src, src, 16
++L(loop_end):
+ 	shrn	vend.8b, vhas_nul.8h, 4		/* 128->64 */
+ 	sub	result, src, srcin
+ 	fmov	synd, dend
+ #ifndef __AARCH64EB__
+ 	rbit	synd, synd
+ #endif
++	add	result, result, 16
+ 	clz	tmp, synd
+ 	add	result, result, tmp, lsr 2
+ 	ret
+diff --git a/sysdeps/aarch64/strnlen.S b/sysdeps/aarch64/strnlen.S
+index 282bddc9aa..a44a49a920 100644
+--- a/sysdeps/aarch64/strnlen.S
++++ b/sysdeps/aarch64/strnlen.S
+@@ -44,19 +44,16 @@
+ 
+ /*
+    Core algorithm:
+-
+-   For each 16-byte chunk we calculate a 64-bit nibble mask value with four bits
+-   per byte. We take 4 bits of every comparison byte with shift right and narrow
+-   by 4 instruction. Since the bits in the nibble mask reflect the order in
+-   which things occur in the original string, counting trailing zeros identifies
+-   exactly which byte matched.  */
++   Process the string in 16-byte aligned chunks. Compute a 64-bit mask with
++   four bits per byte using the shrn instruction. A count trailing zeros then
++   identifies the first zero byte.  */
+ 
+ ENTRY (__strnlen)
+ 	PTR_ARG (0)
+ 	SIZE_ARG (1)
+ 	bic	src, srcin, 15
+ 	cbz	cntin, L(nomatch)
+-	ld1	{vdata.16b}, [src], 16
++	ld1	{vdata.16b}, [src]
+ 	cmeq	vhas_chr.16b, vdata.16b, 0
+ 	lsl	shift, srcin, 2
+ 	shrn	vend.8b, vhas_chr.8h, 4		/* 128->64 */
+@@ -71,36 +68,40 @@ L(finish):
+ 	csel	result, cntin, result, ls
+ 	ret
+ 
++L(nomatch):
++	mov	result, cntin
++	ret
++
+ L(start_loop):
+ 	sub	tmp, src, srcin
++	add	tmp, tmp, 17
+ 	subs	cntrem, cntin, tmp
+-	b.ls	L(nomatch)
++	b.lo	L(nomatch)
+ 
+ 	/* Make sure that it won't overread by a 16-byte chunk */
+-	add	tmp, cntrem, 15
+-	tbnz	tmp, 4, L(loop32_2)
+-
++	tbz	cntrem, 4, L(loop32_2)
++	sub	src, src, 16
+ 	.p2align 5
+ L(loop32):
+-	ldr	qdata, [src], 16
++	ldr	qdata, [src, 32]!
+ 	cmeq	vhas_chr.16b, vdata.16b, 0
+ 	umaxp	vend.16b, vhas_chr.16b, vhas_chr.16b		/* 128->64 */
+ 	fmov	synd, dend
+ 	cbnz	synd, L(end)
+ L(loop32_2):
+-	ldr	qdata, [src], 16
++	ldr	qdata, [src, 16]
+ 	subs	cntrem, cntrem, 32
+ 	cmeq	vhas_chr.16b, vdata.16b, 0
+-	b.ls	L(end)
++	b.lo	L(end_2)
+ 	umaxp	vend.16b, vhas_chr.16b, vhas_chr.16b		/* 128->64 */
+ 	fmov	synd, dend
+ 	cbz	synd, L(loop32)
+-
++L(end_2):
++	add	src, src, 16
+ L(end):
+ 	shrn	vend.8b, vhas_chr.8h, 4		/* 128->64 */
+-	sub	src, src, 16
+-	mov	synd, vend.d[0]
+ 	sub	result, src, srcin
++	fmov	synd, dend
+ #ifndef __AARCH64EB__
+ 	rbit	synd, synd
+ #endif
+@@ -110,10 +111,6 @@ L(end):
+ 	csel	result, cntin, result, ls
+ 	ret
+ 
+-L(nomatch):
+-	mov	result, cntin
+-	ret
+-
+ END (__strnlen)
+ libc_hidden_def (__strnlen)
+ weak_alias (__strnlen, strnlen)
+diff --git a/sysdeps/aarch64/strrchr.S b/sysdeps/aarch64/strrchr.S
+index 596e77c43b..eda6fefb99 100644
+--- a/sysdeps/aarch64/strrchr.S
++++ b/sysdeps/aarch64/strrchr.S
+@@ -22,19 +22,16 @@
+ 
+ /* Assumptions:
+  *
+- * ARMv8-a, AArch64
+- * Neon Available.
++ * ARMv8-a, AArch64, Advanced SIMD.
+  * MTE compatible.
+  */
+ 
+-/* Arguments and results.  */
+ #define srcin		x0
+ #define chrin		w1
+ #define result		x0
+ 
+ #define src		x2
+ #define tmp		x3
+-#define wtmp		w3
+ #define synd		x3
+ #define shift		x4
+ #define src_match	x4
+@@ -46,7 +43,6 @@
+ #define vhas_nul	v2
+ #define vhas_chr	v3
+ #define vrepmask	v4
+-#define vrepmask2	v5
+ #define vend		v5
+ #define dend		d5
+ 
+@@ -58,59 +54,71 @@
+    the relevant byte matched the requested character; bits 2-3 are set
+    if the relevant byte matched the NUL end of string.  */
+ 
+-ENTRY(strrchr)
++ENTRY (strrchr)
+ 	PTR_ARG (0)
+ 	bic	src, srcin, 15
+ 	dup	vrepchr.16b, chrin
+-	mov	wtmp, 0x3003
+-	dup	vrepmask.8h, wtmp
+-	tst	srcin, 15
+-	beq	L(loop1)
+-
+-	ld1	{vdata.16b}, [src], 16
++	movi	vrepmask.16b, 0x33
++	ld1	{vdata.16b}, [src]
+ 	cmeq	vhas_nul.16b, vdata.16b, 0
+ 	cmeq	vhas_chr.16b, vdata.16b, vrepchr.16b
+-	mov	wtmp, 0xf00f
+-	dup	vrepmask2.8h, wtmp
+ 	bit	vhas_nul.16b, vhas_chr.16b, vrepmask.16b
+-	and	vhas_nul.16b, vhas_nul.16b, vrepmask2.16b
+-	addp	vend.16b, vhas_nul.16b, vhas_nul.16b
++	shrn	vend.8b, vhas_nul.8h, 4
+ 	lsl	shift, srcin, 2
+ 	fmov	synd, dend
+ 	lsr	synd, synd, shift
+ 	lsl	synd, synd, shift
+ 	ands	nul_match, synd, 0xcccccccccccccccc
+ 	bne	L(tail)
+-	cbnz	synd, L(loop2)
++	cbnz	synd, L(loop2_start)
+ 
+-	.p2align 5
++	.p2align 4
+ L(loop1):
+-	ld1	{vdata.16b}, [src], 16
++	ldr	q1, [src, 16]
++	cmeq	vhas_chr.16b, vdata.16b, vrepchr.16b
++	cmhs	vhas_nul.16b, vhas_chr.16b, vdata.16b
++	umaxp	vend.16b, vhas_nul.16b, vhas_nul.16b
++	fmov	synd, dend
++	cbnz	synd, L(loop1_end)
++	ldr	q1, [src, 32]!
+ 	cmeq	vhas_chr.16b, vdata.16b, vrepchr.16b
+ 	cmhs	vhas_nul.16b, vhas_chr.16b, vdata.16b
+ 	umaxp	vend.16b, vhas_nul.16b, vhas_nul.16b
+ 	fmov	synd, dend
+ 	cbz	synd, L(loop1)
+-
++	sub	src, src, 16
++L(loop1_end):
++	add	src, src, 16
+ 	cmeq	vhas_nul.16b, vdata.16b, 0
++#ifdef __AARCH64EB__
++	bif	vhas_nul.16b, vhas_chr.16b, vrepmask.16b
++	shrn	vend.8b, vhas_nul.8h, 4
++	fmov	synd, dend
++	rbit	synd, synd
++#else
+ 	bit	vhas_nul.16b, vhas_chr.16b, vrepmask.16b
+-	bic	vhas_nul.8h, 0x0f, lsl 8
+-	addp	vend.16b, vhas_nul.16b, vhas_nul.16b
++	shrn	vend.8b, vhas_nul.8h, 4
+ 	fmov	synd, dend
++#endif
+ 	ands	nul_match, synd, 0xcccccccccccccccc
+-	beq	L(loop2)
+-
++	beq	L(loop2_start)
+ L(tail):
+ 	sub	nul_match, nul_match, 1
+ 	and	chr_match, synd, 0x3333333333333333
+ 	ands	chr_match, chr_match, nul_match
+-	sub	result, src, 1
++	add	result, src, 15
+ 	clz	tmp, chr_match
+ 	sub	result, result, tmp, lsr 2
+ 	csel	result, result, xzr, ne
+ 	ret
+ 
+ 	.p2align 4
++	nop
++	nop
++L(loop2_start):
++	add	src, src, 16
++	bic	vrepmask.8h, 0xf0
++
+ L(loop2):
+ 	cmp	synd, 0
+ 	csel	src_match, src, src_match, ne
+diff --git a/sysdeps/arc/utmp-size.h b/sysdeps/arc/utmp-size.h
+new file mode 100644
+index 0000000000..a247fcd3da
+--- /dev/null
++++ b/sysdeps/arc/utmp-size.h
+@@ -0,0 +1,3 @@
++/* arc has less padding than other architectures with 64-bit time_t.  */
++#define UTMP_SIZE 392
++#define LASTLOG_SIZE 296
+diff --git a/sysdeps/arm/bits/wordsize.h b/sysdeps/arm/bits/wordsize.h
+new file mode 100644
+index 0000000000..6ecbfe7c86
+--- /dev/null
++++ b/sysdeps/arm/bits/wordsize.h
+@@ -0,0 +1,21 @@
++/* Copyright (C) 1999-2024 Free Software Foundation, Inc.
++   This file is part of the GNU C Library.
++
++   The GNU C Library is free software; you can redistribute it and/or
++   modify it under the terms of the GNU Lesser General Public
++   License as published by the Free Software Foundation; either
++   version 2.1 of the License, or (at your option) any later version.
++
++   The GNU C Library is distributed in the hope that it will be useful,
++   but WITHOUT ANY WARRANTY; without even the implied warranty of
++   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
++   Lesser General Public License for more details.
++
++   You should have received a copy of the GNU Lesser General Public
++   License along with the GNU C Library; if not, see
++   <https://www.gnu.org/licenses/>.  */
++
++#define __WORDSIZE			32
++#define __WORDSIZE_TIME64_COMPAT32	1
++#define __WORDSIZE32_SIZE_ULONG		0
++#define __WORDSIZE32_PTRDIFF_LONG	0
+diff --git a/sysdeps/arm/dl-machine.h b/sysdeps/arm/dl-machine.h
+index 6a422713bd..659c6f16da 100644
+--- a/sysdeps/arm/dl-machine.h
++++ b/sysdeps/arm/dl-machine.h
+@@ -137,7 +137,6 @@ _start:\n\
+ _dl_start_user:\n\
+ 	adr	r6, .L_GET_GOT\n\
+ 	add	sl, sl, r6\n\
+-	ldr	r4, [sl, r4]\n\
+ 	@ save the entry point in another register\n\
+ 	mov	r6, r0\n\
+ 	@ get the original arg count\n\
+diff --git a/sysdeps/arm/utmp-size.h b/sysdeps/arm/utmp-size.h
+new file mode 100644
+index 0000000000..8f21ebe1b6
+--- /dev/null
++++ b/sysdeps/arm/utmp-size.h
+@@ -0,0 +1,2 @@
++#define UTMP_SIZE 384
++#define LASTLOG_SIZE 292
+diff --git a/sysdeps/csky/bits/wordsize.h b/sysdeps/csky/bits/wordsize.h
+new file mode 100644
+index 0000000000..6ecbfe7c86
+--- /dev/null
++++ b/sysdeps/csky/bits/wordsize.h
+@@ -0,0 +1,21 @@
++/* Copyright (C) 1999-2024 Free Software Foundation, Inc.
++   This file is part of the GNU C Library.
++
++   The GNU C Library is free software; you can redistribute it and/or
++   modify it under the terms of the GNU Lesser General Public
++   License as published by the Free Software Foundation; either
++   version 2.1 of the License, or (at your option) any later version.
++
++   The GNU C Library is distributed in the hope that it will be useful,
++   but WITHOUT ANY WARRANTY; without even the implied warranty of
++   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
++   Lesser General Public License for more details.
++
++   You should have received a copy of the GNU Lesser General Public
++   License along with the GNU C Library; if not, see
++   <https://www.gnu.org/licenses/>.  */
++
++#define __WORDSIZE			32
++#define __WORDSIZE_TIME64_COMPAT32	1
++#define __WORDSIZE32_SIZE_ULONG		0
++#define __WORDSIZE32_PTRDIFF_LONG	0
+diff --git a/sysdeps/csky/utmp-size.h b/sysdeps/csky/utmp-size.h
+new file mode 100644
+index 0000000000..8f21ebe1b6
+--- /dev/null
++++ b/sysdeps/csky/utmp-size.h
+@@ -0,0 +1,2 @@
++#define UTMP_SIZE 384
++#define LASTLOG_SIZE 292
+diff --git a/sysdeps/generic/ldsodefs.h b/sysdeps/generic/ldsodefs.h
+index 050a3032de..c2627fced7 100644
+--- a/sysdeps/generic/ldsodefs.h
++++ b/sysdeps/generic/ldsodefs.h
+@@ -105,6 +105,9 @@ typedef struct link_map *lookup_t;
+    DT_PREINIT_ARRAY.  */
+ typedef void (*dl_init_t) (int, char **, char **);
+ 
++/* Type of a constructor function, in DT_FINI, DT_FINI_ARRAY.  */
++typedef void (*fini_t) (void);
++
+ /* On some architectures a pointer to a function is not just a pointer
     to the actual code of the function but rather an architecture
     specific descriptor. */
 @@ -1048,9 +1051,16 @@ extern void _dl_init (struct link_map *main_map, int argc, char **argv,
@@ -7850,16 +11933,45 @@ index 0000000000..4713b30a8a
 +   Lesser General Public License for more details.
 +
 +   You should have received a copy of the GNU Lesser General Public
-+   License along with the GNU C Library; see the file COPYING.LIB.  If
-+   not, see <https://www.gnu.org/licenses/>.  */
-+
-+#ifndef _LIBC_LOCK_ARCH_H
-+#define _LIBC_LOCK_ARCH_H
++   License along with the GNU C Library; see the file COPYING.LIB.  If
++   not, see <https://www.gnu.org/licenses/>.  */
++
++#ifndef _LIBC_LOCK_ARCH_H
++#define _LIBC_LOCK_ARCH_H
++
++/* The default definition uses the natural alignment from the lock type.  */
++#define __LIBC_LOCK_ALIGNMENT
++
++#endif
+diff --git a/sysdeps/generic/utmp-size.h b/sysdeps/generic/utmp-size.h
+new file mode 100644
+index 0000000000..89dbe878b0
+--- /dev/null
++++ b/sysdeps/generic/utmp-size.h
+@@ -0,0 +1,23 @@
++/* Expected sizes of utmp-related structures stored in files.  64-bit version.
++   Copyright (C) 2024 Free Software Foundation, Inc.
++   This file is part of the GNU C Library.
++
++   The GNU C Library is free software; you can redistribute it and/or
++   modify it under the terms of the GNU Lesser General Public
++   License as published by the Free Software Foundation; either
++   version 2.1 of the License, or (at your option) any later version.
++
++   The GNU C Library is distributed in the hope that it will be useful,
++   but WITHOUT ANY WARRANTY; without even the implied warranty of
++   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
++   Lesser General Public License for more details.
++
++   You should have received a copy of the GNU Lesser General Public
++   License along with the GNU C Library; if not, see
++   <https://www.gnu.org/licenses/>.  */
 +
-+/* The default definition uses the natural alignment from the lock type.  */
-+#define __LIBC_LOCK_ALIGNMENT
++/* Expected size, in bytes, of struct utmp and struct utmpx.  */
++#define UTMP_SIZE 400
 +
-+#endif
++/* Expected size, in bytes, of struct lastlog.  */
++#define LASTLOG_SIZE 296
 diff --git a/sysdeps/hppa/dl-machine.h b/sysdeps/hppa/dl-machine.h
 index c865713be1..1d51948566 100644
 --- a/sysdeps/hppa/dl-machine.h
@@ -7911,6 +12023,14 @@ index c865713be1..1d51948566 100644
  "	bl	_dl_init,%r2\n"						\
  "	ldo	4(%r23),%r23\n"	/* delay slot */			\
  									\
+diff --git a/sysdeps/hppa/utmp-size.h b/sysdeps/hppa/utmp-size.h
+new file mode 100644
+index 0000000000..8f21ebe1b6
+--- /dev/null
++++ b/sysdeps/hppa/utmp-size.h
+@@ -0,0 +1,2 @@
++#define UTMP_SIZE 384
++#define LASTLOG_SIZE 292
 diff --git a/sysdeps/ieee754/ldbl-128/e_j1l.c b/sysdeps/ieee754/ldbl-128/e_j1l.c
 index 54c457681a..9a9c5c6f00 100644
 --- a/sysdeps/ieee754/ldbl-128/e_j1l.c
@@ -7999,6 +12119,59 @@ index d85154e73a..d8c0de1faf 100644
        return res;
      }
    else
+diff --git a/sysdeps/m68k/bits/wordsize.h b/sysdeps/m68k/bits/wordsize.h
+new file mode 100644
+index 0000000000..6ecbfe7c86
+--- /dev/null
++++ b/sysdeps/m68k/bits/wordsize.h
+@@ -0,0 +1,21 @@
++/* Copyright (C) 1999-2024 Free Software Foundation, Inc.
++   This file is part of the GNU C Library.
++
++   The GNU C Library is free software; you can redistribute it and/or
++   modify it under the terms of the GNU Lesser General Public
++   License as published by the Free Software Foundation; either
++   version 2.1 of the License, or (at your option) any later version.
++
++   The GNU C Library is distributed in the hope that it will be useful,
++   but WITHOUT ANY WARRANTY; without even the implied warranty of
++   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
++   Lesser General Public License for more details.
++
++   You should have received a copy of the GNU Lesser General Public
++   License along with the GNU C Library; if not, see
++   <https://www.gnu.org/licenses/>.  */
++
++#define __WORDSIZE			32
++#define __WORDSIZE_TIME64_COMPAT32	1
++#define __WORDSIZE32_SIZE_ULONG		0
++#define __WORDSIZE32_PTRDIFF_LONG	0
+diff --git a/sysdeps/m68k/utmp-size.h b/sysdeps/m68k/utmp-size.h
+new file mode 100644
+index 0000000000..5946685819
+--- /dev/null
++++ b/sysdeps/m68k/utmp-size.h
+@@ -0,0 +1,3 @@
++/* m68k has 2-byte alignment.  */
++#define UTMP_SIZE 382
++#define LASTLOG_SIZE 292
+diff --git a/sysdeps/mach/getsysstats.c b/sysdeps/mach/getsysstats.c
+index 37ea5e6a7a..80ea7e17cb 100644
+--- a/sysdeps/mach/getsysstats.c
++++ b/sysdeps/mach/getsysstats.c
+@@ -62,12 +62,6 @@ __get_nprocs (void)
+ libc_hidden_def (__get_nprocs)
+ weak_alias (__get_nprocs, get_nprocs)
+ 
+-int
+-__get_nprocs_sched (void)
+-{
+-  return __get_nprocs ();
+-}
+-
+ /* Return the number of physical pages on the system. */
+ long int
+ __get_phys_pages (void)
 diff --git a/sysdeps/mach/hurd/bits/socket.h b/sysdeps/mach/hurd/bits/socket.h
 index 5b35ea81ec..70fce4fb27 100644
 --- a/sysdeps/mach/hurd/bits/socket.h
@@ -8062,6 +12235,138 @@ index 5b35ea81ec..70fce4fb27 100644
    return __cmsg;
  }
  #endif	/* Use `extern inline'.  */
+diff --git a/sysdeps/microblaze/bits/wordsize.h b/sysdeps/microblaze/bits/wordsize.h
+new file mode 100644
+index 0000000000..6ecbfe7c86
+--- /dev/null
++++ b/sysdeps/microblaze/bits/wordsize.h
+@@ -0,0 +1,21 @@
++/* Copyright (C) 1999-2024 Free Software Foundation, Inc.
++   This file is part of the GNU C Library.
++
++   The GNU C Library is free software; you can redistribute it and/or
++   modify it under the terms of the GNU Lesser General Public
++   License as published by the Free Software Foundation; either
++   version 2.1 of the License, or (at your option) any later version.
++
++   The GNU C Library is distributed in the hope that it will be useful,
++   but WITHOUT ANY WARRANTY; without even the implied warranty of
++   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
++   Lesser General Public License for more details.
++
++   You should have received a copy of the GNU Lesser General Public
++   License along with the GNU C Library; if not, see
++   <https://www.gnu.org/licenses/>.  */
++
++#define __WORDSIZE			32
++#define __WORDSIZE_TIME64_COMPAT32	1
++#define __WORDSIZE32_SIZE_ULONG		0
++#define __WORDSIZE32_PTRDIFF_LONG	0
+diff --git a/sysdeps/microblaze/utmp-size.h b/sysdeps/microblaze/utmp-size.h
+new file mode 100644
+index 0000000000..8f21ebe1b6
+--- /dev/null
++++ b/sysdeps/microblaze/utmp-size.h
+@@ -0,0 +1,2 @@
++#define UTMP_SIZE 384
++#define LASTLOG_SIZE 292
+diff --git a/sysdeps/mips/bits/wordsize.h b/sysdeps/mips/bits/wordsize.h
+index e521dc589c..c6a4a4270b 100644
+--- a/sysdeps/mips/bits/wordsize.h
++++ b/sysdeps/mips/bits/wordsize.h
+@@ -19,11 +19,7 @@
+ 
+ #define __WORDSIZE			_MIPS_SZPTR
+ 
+-#if _MIPS_SIM == _ABI64
+-# define __WORDSIZE_TIME64_COMPAT32	1
+-#else
+-# define __WORDSIZE_TIME64_COMPAT32	0
+-#endif
++#define __WORDSIZE_TIME64_COMPAT32	1
+ 
+ #if __WORDSIZE == 32
+ #define __WORDSIZE32_SIZE_ULONG		0
+diff --git a/sysdeps/mips/utmp-size.h b/sysdeps/mips/utmp-size.h
+new file mode 100644
+index 0000000000..8f21ebe1b6
+--- /dev/null
++++ b/sysdeps/mips/utmp-size.h
+@@ -0,0 +1,2 @@
++#define UTMP_SIZE 384
++#define LASTLOG_SIZE 292
+diff --git a/sysdeps/nios2/bits/wordsize.h b/sysdeps/nios2/bits/wordsize.h
+new file mode 100644
+index 0000000000..6ecbfe7c86
+--- /dev/null
++++ b/sysdeps/nios2/bits/wordsize.h
+@@ -0,0 +1,21 @@
++/* Copyright (C) 1999-2024 Free Software Foundation, Inc.
++   This file is part of the GNU C Library.
++
++   The GNU C Library is free software; you can redistribute it and/or
++   modify it under the terms of the GNU Lesser General Public
++   License as published by the Free Software Foundation; either
++   version 2.1 of the License, or (at your option) any later version.
++
++   The GNU C Library is distributed in the hope that it will be useful,
++   but WITHOUT ANY WARRANTY; without even the implied warranty of
++   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
++   Lesser General Public License for more details.
++
++   You should have received a copy of the GNU Lesser General Public
++   License along with the GNU C Library; if not, see
++   <https://www.gnu.org/licenses/>.  */
++
++#define __WORDSIZE			32
++#define __WORDSIZE_TIME64_COMPAT32	1
++#define __WORDSIZE32_SIZE_ULONG		0
++#define __WORDSIZE32_PTRDIFF_LONG	0
+diff --git a/sysdeps/nios2/utmp-size.h b/sysdeps/nios2/utmp-size.h
+new file mode 100644
+index 0000000000..8f21ebe1b6
+--- /dev/null
++++ b/sysdeps/nios2/utmp-size.h
+@@ -0,0 +1,2 @@
++#define UTMP_SIZE 384
++#define LASTLOG_SIZE 292
+diff --git a/sysdeps/nptl/dl-tls_init_tp.c b/sysdeps/nptl/dl-tls_init_tp.c
+index 53fba774a5..662bc0158d 100644
+--- a/sysdeps/nptl/dl-tls_init_tp.c
++++ b/sysdeps/nptl/dl-tls_init_tp.c
+@@ -45,8 +45,6 @@ rtld_mutex_dummy (pthread_mutex_t *lock)
+ #endif
+ 
+ const unsigned int __rseq_flags;
+-const unsigned int __rseq_size attribute_relro;
+-const ptrdiff_t __rseq_offset attribute_relro;
+ 
+ void
+ __tls_pre_init_tp (void)
+@@ -106,12 +104,7 @@ __tls_init_tp (void)
+     do_rseq = TUNABLE_GET (rseq, int, NULL);
+ #endif
+     if (rseq_register_current_thread (pd, do_rseq))
+-      {
+-        /* We need a writable view of the variables.  They are in
+-           .data.relro and are not yet write-protected.  */
+-        extern unsigned int size __asm__ ("__rseq_size");
+-        size = sizeof (pd->rseq_area);
+-      }
++      _rseq_size = RSEQ_AREA_SIZE_INITIAL_USED;
+ 
+ #ifdef RSEQ_SIG
+     /* This should be a compile-time constant, but the current
+@@ -119,8 +112,7 @@ __tls_init_tp (void)
+        all targets support __thread_pointer, so set __rseq_offset only
+        if thre rseq registration may have happened because RSEQ_SIG is
+        defined.  */
+-    extern ptrdiff_t offset __asm__ ("__rseq_offset");
+-    offset = (char *) &pd->rseq_area - (char *) __thread_pointer ();
++    _rseq_offset = (char *) &pd->rseq_area - (char *) __thread_pointer ();
+ #endif
+   }
+ 
 diff --git a/sysdeps/nptl/libc-lock.h b/sysdeps/nptl/libc-lock.h
 index 5af476c48b..63b3f3d75c 100644
 --- a/sysdeps/nptl/libc-lock.h
@@ -8104,6 +12409,15 @@ index d3a6837fd2..425f514c5c 100644
  typedef struct { pthread_mutex_t mutex; } __rtld_lock_recursive_t;
  typedef pthread_rwlock_t __libc_rwlock_t;
  
+diff --git a/sysdeps/or1k/utmp-size.h b/sysdeps/or1k/utmp-size.h
+new file mode 100644
+index 0000000000..6b3653aa4d
+--- /dev/null
++++ b/sysdeps/or1k/utmp-size.h
+@@ -0,0 +1,3 @@
++/* or1k has less padding than other architectures with 64-bit time_t.  */
++#define UTMP_SIZE 392
++#define LASTLOG_SIZE 296
 diff --git a/sysdeps/posix/getaddrinfo.c b/sysdeps/posix/getaddrinfo.c
 index bcff909b2f..f975dcd2bc 100644
 --- a/sysdeps/posix/getaddrinfo.c
@@ -8255,8 +12569,255 @@ index 2a82e53baf..d941024963 100644
  #else
    register unsigned long thread_pointer __asm__ ("r2");
    asm ("bcl 20,31,1f\n1:\t"
+diff --git a/sysdeps/powerpc/powerpc32/bits/wordsize.h b/sysdeps/powerpc/powerpc32/bits/wordsize.h
+index 04ca9debf0..6993fb6b29 100644
+--- a/sysdeps/powerpc/powerpc32/bits/wordsize.h
++++ b/sysdeps/powerpc/powerpc32/bits/wordsize.h
+@@ -2,10 +2,9 @@
+ 
+ #if defined __powerpc64__
+ # define __WORDSIZE	64
+-# define __WORDSIZE_TIME64_COMPAT32	1
+ #else
+ # define __WORDSIZE	32
+-# define __WORDSIZE_TIME64_COMPAT32	0
+ # define __WORDSIZE32_SIZE_ULONG	0
+ # define __WORDSIZE32_PTRDIFF_LONG	0
+ #endif
++#define __WORDSIZE_TIME64_COMPAT32	1
+diff --git a/sysdeps/powerpc/powerpc64/bits/wordsize.h b/sysdeps/powerpc/powerpc64/bits/wordsize.h
+index 04ca9debf0..6993fb6b29 100644
+--- a/sysdeps/powerpc/powerpc64/bits/wordsize.h
++++ b/sysdeps/powerpc/powerpc64/bits/wordsize.h
+@@ -2,10 +2,9 @@
+ 
+ #if defined __powerpc64__
+ # define __WORDSIZE	64
+-# define __WORDSIZE_TIME64_COMPAT32	1
+ #else
+ # define __WORDSIZE	32
+-# define __WORDSIZE_TIME64_COMPAT32	0
+ # define __WORDSIZE32_SIZE_ULONG	0
+ # define __WORDSIZE32_PTRDIFF_LONG	0
+ #endif
++#define __WORDSIZE_TIME64_COMPAT32	1
+diff --git a/sysdeps/powerpc/powerpc64/dl-machine.h b/sysdeps/powerpc/powerpc64/dl-machine.h
+index bb0ccd0811..3868bcc2f7 100644
+--- a/sysdeps/powerpc/powerpc64/dl-machine.h
++++ b/sysdeps/powerpc/powerpc64/dl-machine.h
+@@ -79,6 +79,7 @@ elf_host_tolerates_class (const Elf64_Ehdr *ehdr)
+ static inline Elf64_Addr
+ elf_machine_load_address (void) __attribute__ ((const));
+ 
++#ifndef __PCREL__
+ static inline Elf64_Addr
+ elf_machine_load_address (void)
+ {
+@@ -106,6 +107,24 @@ elf_machine_dynamic (void)
+   /* Then subtract off the load address offset.  */
+   return runtime_dynamic - elf_machine_load_address() ;
+ }
++#else /* __PCREL__ */
++/* In PCREL mode, r2 may have been clobbered.  Rely on relative
++   relocations instead.  */
++
++static inline ElfW(Addr)
++elf_machine_load_address (void)
++{
++  extern const ElfW(Ehdr) __ehdr_start attribute_hidden;
++  return (ElfW(Addr)) &__ehdr_start;
++}
++
++static inline ElfW(Addr)
++elf_machine_dynamic (void)
++{
++  extern ElfW(Dyn) _DYNAMIC[] attribute_hidden;
++  return (ElfW(Addr)) _DYNAMIC - elf_machine_load_address ();
++}
++#endif /* __PCREL__ */
+ 
+ /* The PLT uses Elf64_Rela relocs.  */
+ #define elf_machine_relplt elf_machine_rela
+diff --git a/sysdeps/powerpc/utmp-size.h b/sysdeps/powerpc/utmp-size.h
+new file mode 100644
+index 0000000000..8f21ebe1b6
+--- /dev/null
++++ b/sysdeps/powerpc/utmp-size.h
+@@ -0,0 +1,2 @@
++#define UTMP_SIZE 384
++#define LASTLOG_SIZE 292
+diff --git a/sysdeps/riscv/utmp-size.h b/sysdeps/riscv/utmp-size.h
+new file mode 100644
+index 0000000000..8f21ebe1b6
+--- /dev/null
++++ b/sysdeps/riscv/utmp-size.h
+@@ -0,0 +1,2 @@
++#define UTMP_SIZE 384
++#define LASTLOG_SIZE 292
+diff --git a/sysdeps/s390/wcsncmp-vx.S b/sysdeps/s390/wcsncmp-vx.S
+index c518539bfa..5db0c707a1 100644
+--- a/sysdeps/s390/wcsncmp-vx.S
++++ b/sysdeps/s390/wcsncmp-vx.S
+@@ -59,14 +59,7 @@ ENTRY(WCSNCMP_Z13)
+ 	sllg	%r4,%r4,2	/* Convert character-count to byte-count.  */
+ 	locgrne	%r4,%r1		/* Use max byte-count, if bit 0/1 was one.  */
+ 
+-	/* Check first character without vector load.  */
+-	lghi	%r5,4		/* current_len = 4 bytes.  */
+-	/* Check s1/2[0].  */
+-	lt	%r0,0(%r2)
+-	l	%r1,0(%r3)
+-	je	.Lend_cmp_one_char
+-	crjne	%r0,%r1,.Lend_cmp_one_char
+-
++	lghi	%r5,0		/* current_len = 0 bytes.  */
+ .Lloop:
+ 	vlbb	%v17,0(%r5,%r3),6 /* Load s2 to block boundary.  */
+ 	vlbb	%v16,0(%r5,%r2),6 /* Load s1 to block boundary.  */
+@@ -167,7 +160,6 @@ ENTRY(WCSNCMP_Z13)
+ 	srl	%r4,2		/* And convert it to character-index.  */
+ 	vlgvf	%r0,%v16,0(%r4)	/* Load character-values.  */
+ 	vlgvf	%r1,%v17,0(%r4)
+-.Lend_cmp_one_char:
+ 	cr	%r0,%r1
+ 	je	.Lend_equal
+ 	lghi	%r2,1
+diff --git a/sysdeps/sh/bits/wordsize.h b/sysdeps/sh/bits/wordsize.h
+new file mode 100644
+index 0000000000..6ecbfe7c86
+--- /dev/null
++++ b/sysdeps/sh/bits/wordsize.h
+@@ -0,0 +1,21 @@
++/* Copyright (C) 1999-2024 Free Software Foundation, Inc.
++   This file is part of the GNU C Library.
++
++   The GNU C Library is free software; you can redistribute it and/or
++   modify it under the terms of the GNU Lesser General Public
++   License as published by the Free Software Foundation; either
++   version 2.1 of the License, or (at your option) any later version.
++
++   The GNU C Library is distributed in the hope that it will be useful,
++   but WITHOUT ANY WARRANTY; without even the implied warranty of
++   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
++   Lesser General Public License for more details.
++
++   You should have received a copy of the GNU Lesser General Public
++   License along with the GNU C Library; if not, see
++   <https://www.gnu.org/licenses/>.  */
++
++#define __WORDSIZE			32
++#define __WORDSIZE_TIME64_COMPAT32	1
++#define __WORDSIZE32_SIZE_ULONG		0
++#define __WORDSIZE32_PTRDIFF_LONG	0
+diff --git a/sysdeps/sh/utmp-size.h b/sysdeps/sh/utmp-size.h
+new file mode 100644
+index 0000000000..8f21ebe1b6
+--- /dev/null
++++ b/sysdeps/sh/utmp-size.h
+@@ -0,0 +1,2 @@
++#define UTMP_SIZE 384
++#define LASTLOG_SIZE 292
+diff --git a/sysdeps/sparc/sparc32/bits/wordsize.h b/sysdeps/sparc/sparc32/bits/wordsize.h
+index 2f66f10d72..a2e79e0fa9 100644
+--- a/sysdeps/sparc/sparc32/bits/wordsize.h
++++ b/sysdeps/sparc/sparc32/bits/wordsize.h
+@@ -1,11 +1,6 @@
+ /* Determine the wordsize from the preprocessor defines.  */
+ 
+-#if defined __arch64__ || defined __sparcv9
+-# define __WORDSIZE	64
+-# define __WORDSIZE_TIME64_COMPAT32	1
+-#else
+-# define __WORDSIZE	32
+-# define __WORDSIZE_TIME64_COMPAT32	0
+-# define __WORDSIZE32_SIZE_ULONG	0
+-# define __WORDSIZE32_PTRDIFF_LONG	0
+-#endif
++#define __WORDSIZE	32
++#define __WORDSIZE_TIME64_COMPAT32	1
++#define __WORDSIZE32_SIZE_ULONG	0
++#define __WORDSIZE32_PTRDIFF_LONG	0
+diff --git a/sysdeps/sparc/sparc32/memset.S b/sysdeps/sparc/sparc32/memset.S
+index b1b67cb2d1..5154263317 100644
+--- a/sysdeps/sparc/sparc32/memset.S
++++ b/sysdeps/sparc/sparc32/memset.S
+@@ -55,7 +55,7 @@ ENTRY(memset)
+ 
+ 	andcc		%o0, 3, %o2
+ 	bne		3f
+-4:	 andcc		%o0, 4, %g0
++5:	 andcc		%o0, 4, %g0
+ 
+ 	be		2f
+ 	 mov		%g3, %g2
+@@ -139,7 +139,7 @@ ENTRY(memset)
+ 	stb		%g3, [%o0 + 0x02]
+ 2:	sub		%o2, 4, %o2
+ 	add		%o1, %o2, %o1
+-	b		4b
++	b		5b
+ 	 sub		%o0, %o2, %o0
+ END(memset)
+ libc_hidden_builtin_def (memset)
+diff --git a/sysdeps/sparc/sparc64/bits/wordsize.h b/sysdeps/sparc/sparc64/bits/wordsize.h
+index 2f66f10d72..ea103e5970 100644
+--- a/sysdeps/sparc/sparc64/bits/wordsize.h
++++ b/sysdeps/sparc/sparc64/bits/wordsize.h
+@@ -2,10 +2,9 @@
+ 
+ #if defined __arch64__ || defined __sparcv9
+ # define __WORDSIZE	64
+-# define __WORDSIZE_TIME64_COMPAT32	1
+ #else
+ # define __WORDSIZE	32
+-# define __WORDSIZE_TIME64_COMPAT32	0
+ # define __WORDSIZE32_SIZE_ULONG	0
+ # define __WORDSIZE32_PTRDIFF_LONG	0
+ #endif
++#define __WORDSIZE_TIME64_COMPAT32	1
+diff --git a/sysdeps/sparc/sparc64/memmove.S b/sysdeps/sparc/sparc64/memmove.S
+index 8d46f2cd4e..7746684160 100644
+--- a/sysdeps/sparc/sparc64/memmove.S
++++ b/sysdeps/sparc/sparc64/memmove.S
+@@ -38,7 +38,7 @@ ENTRY(memmove)
+ /*
+  * normal, copy forwards
+  */
+-2:	ble	%XCC, .Ldbytecp
++2:	bleu	%XCC, .Ldbytecp
+ 	 andcc	%o1, 3, %o5	/* is src word aligned  */
+ 	bz,pn	%icc, .Laldst
+ 	 cmp	%o5, 2		/* is src half-word aligned  */
+diff --git a/sysdeps/sparc/sysdep.h b/sysdeps/sparc/sysdep.h
+index 95068071cc..baab6817a6 100644
+--- a/sysdeps/sparc/sysdep.h
++++ b/sysdeps/sparc/sysdep.h
+@@ -76,6 +76,15 @@ C_LABEL(name)				\
+ 	cfi_endproc;			\
+ 	.size name, . - name
+ 
++#define ENTRY_NOCFI(name)			\
++	.align	4;			\
++	.global	C_SYMBOL_NAME(name);	\
++	.type	name, @function;	\
++C_LABEL(name)
++
++#define END_NOCFI(name)			\
++	.size name, . - name
++
+ #undef LOC
+ #define LOC(name)  .L##name
+ 
+diff --git a/sysdeps/sparc/utmp-size.h b/sysdeps/sparc/utmp-size.h
+new file mode 100644
+index 0000000000..8f21ebe1b6
+--- /dev/null
++++ b/sysdeps/sparc/utmp-size.h
+@@ -0,0 +1,2 @@
++#define UTMP_SIZE 384
++#define LASTLOG_SIZE 292
 diff --git a/sysdeps/unix/sysv/linux/Makefile b/sysdeps/unix/sysv/linux/Makefile
-index a139a16532..d5d9af4de2 100644
+index a139a16532..a039048c5d 100644
 --- a/sysdeps/unix/sysv/linux/Makefile
 +++ b/sysdeps/unix/sysv/linux/Makefile
 @@ -265,6 +265,14 @@ $(objpfx)tst-mount-consts.out: ../sysdeps/unix/sysv/linux/tst-mount-consts.py
@@ -8283,6 +12844,154 @@ index a139a16532..d5d9af4de2 100644
  endif
  
  # Don't compile the ctype glue code, since there is no old non-GNU C library.
+@@ -392,6 +402,7 @@ endif
+ 
+ ifeq ($(subdir),elf)
+ sysdep-rtld-routines += dl-brk dl-sbrk dl-getcwd dl-openat64 dl-opendir
++dl-routines += dl-rseq-symbols
+ 
+ libof-lddlibc4 = lddlibc4
+ 
+diff --git a/sysdeps/unix/sysv/linux/aarch64/bits/hwcap.h b/sysdeps/unix/sysv/linux/aarch64/bits/hwcap.h
+index 616239bb84..b7ffea84e5 100644
+--- a/sysdeps/unix/sysv/linux/aarch64/bits/hwcap.h
++++ b/sysdeps/unix/sysv/linux/aarch64/bits/hwcap.h
+@@ -78,3 +78,24 @@
+ #define HWCAP2_AFP		(1 << 20)
+ #define HWCAP2_RPRES		(1 << 21)
+ #define HWCAP2_MTE3		(1 << 22)
++#define HWCAP2_SME		(1 << 23)
++#define HWCAP2_SME_I16I64	(1 << 24)
++#define HWCAP2_SME_F64F64	(1 << 25)
++#define HWCAP2_SME_I8I32	(1 << 26)
++#define HWCAP2_SME_F16F32	(1 << 27)
++#define HWCAP2_SME_B16F32	(1 << 28)
++#define HWCAP2_SME_F32F32	(1 << 29)
++#define HWCAP2_SME_FA64		(1 << 30)
++#define HWCAP2_WFXT		(1UL << 31)
++#define HWCAP2_EBF16		(1UL << 32)
++#define HWCAP2_SVE_EBF16	(1UL << 33)
++#define HWCAP2_CSSC		(1UL << 34)
++#define HWCAP2_RPRFM		(1UL << 35)
++#define HWCAP2_SVE2P1		(1UL << 36)
++#define HWCAP2_SME2		(1UL << 37)
++#define HWCAP2_SME2P1		(1UL << 38)
++#define HWCAP2_SME_I16I32	(1UL << 39)
++#define HWCAP2_SME_BI32I32	(1UL << 40)
++#define HWCAP2_SME_B16B16	(1UL << 41)
++#define HWCAP2_SME_F16F16	(1UL << 42)
++#define HWCAP2_MOPS		(1UL << 43)
+diff --git a/sysdeps/unix/sysv/linux/aarch64/cpu-features.c b/sysdeps/unix/sysv/linux/aarch64/cpu-features.c
+index d14c0f4e1f..2543128352 100644
+--- a/sysdeps/unix/sysv/linux/aarch64/cpu-features.c
++++ b/sysdeps/unix/sysv/linux/aarch64/cpu-features.c
+@@ -20,6 +20,7 @@
+ #include <sys/auxv.h>
+ #include <elf/dl-hwcaps.h>
+ #include <sys/prctl.h>
++#include <sys/utsname.h>
+ 
+ #define DCZID_DZP_MASK (1 << 4)
+ #define DCZID_BS_MASK (0xf)
+@@ -38,11 +39,9 @@ struct cpu_list
+ };
+ 
+ static struct cpu_list cpu_list[] = {
+-      {"falkor",	 0x510FC000},
+       {"thunderxt88",	 0x430F0A10},
+       {"thunderx2t99",   0x431F0AF0},
+       {"thunderx2t99p1", 0x420F5160},
+-      {"phecda",	 0x680F0000},
+       {"ares",		 0x411FD0C0},
+       {"emag",		 0x503F0001},
+       {"kunpeng920", 	 0x481FD010},
+@@ -61,6 +60,46 @@ get_midr_from_mcpu (const char *mcpu)
+ }
+ #endif
+ 
++#if __LINUX_KERNEL_VERSION < 0x060200
++
++/* Return true if we prefer using SVE in string ifuncs.  Old kernels disable
++   SVE after every system call which results in unnecessary traps if memcpy
++   uses SVE.  This is true for kernels between 4.15.0 and before 6.2.0, except
++   for 5.14.0 which was patched.  For these versions return false to avoid using
++   SVE ifuncs.
++   Parse the kernel version into a 24-bit kernel.major.minor value without
++   calling any library functions.  If uname() is not supported or if the version
++   format is not recognized, assume the kernel is modern and return true.  */
++
++static inline bool
++prefer_sve_ifuncs (void)
++{
++  struct utsname buf;
++  const char *p = &buf.release[0];
++  int kernel = 0;
++  int val;
++
++  if (__uname (&buf) < 0)
++    return true;
++
++  for (int shift = 16; shift >= 0; shift -= 8)
++    {
++      for (val = 0; *p >= '0' && *p <= '9'; p++)
++	val = val * 10 + *p - '0';
++      kernel |= (val & 255) << shift;
++      if (*p++ != '.')
++	break;
++    }
++
++  if (kernel >= 0x060200 || kernel == 0x050e00)
++    return true;
++  if (kernel >= 0x040f00)
++    return false;
++  return true;
++}
++
++#endif
++
+ static inline void
+ init_cpu_features (struct cpu_features *cpu_features)
+ {
+@@ -126,4 +165,14 @@ init_cpu_features (struct cpu_features *cpu_features)
+ 
+   /* Check if SVE is supported.  */
+   cpu_features->sve = GLRO (dl_hwcap) & HWCAP_SVE;
++
++  cpu_features->prefer_sve_ifuncs = cpu_features->sve;
++
++#if __LINUX_KERNEL_VERSION < 0x060200
++  if (cpu_features->sve)
++    cpu_features->prefer_sve_ifuncs = prefer_sve_ifuncs ();
++#endif
++
++  /* Check if MOPS is supported.  */
++  cpu_features->mops = GLRO (dl_hwcap2) & HWCAP2_MOPS;
+ }
+diff --git a/sysdeps/unix/sysv/linux/aarch64/cpu-features.h b/sysdeps/unix/sysv/linux/aarch64/cpu-features.h
+index 391165a99c..d51597b923 100644
+--- a/sysdeps/unix/sysv/linux/aarch64/cpu-features.h
++++ b/sysdeps/unix/sysv/linux/aarch64/cpu-features.h
+@@ -47,11 +47,6 @@
+ #define IS_THUNDERX2(midr) (MIDR_IMPLEMENTOR(midr) == 'C'       \
+ 			   && MIDR_PARTNUM(midr) == 0xaf)
+ 
+-#define IS_FALKOR(midr) (MIDR_IMPLEMENTOR(midr) == 'Q'			      \
+-                        && MIDR_PARTNUM(midr) == 0xc00)
+-
+-#define IS_PHECDA(midr) (MIDR_IMPLEMENTOR(midr) == 'h'			      \
+-                        && MIDR_PARTNUM(midr) == 0x000)
+ #define IS_NEOVERSE_N1(midr) (MIDR_IMPLEMENTOR(midr) == 'A'		      \
+ 			      && MIDR_PARTNUM(midr) == 0xd0c)
+ #define IS_NEOVERSE_N2(midr) (MIDR_IMPLEMENTOR(midr) == 'A'		      \
+@@ -76,6 +71,8 @@ struct cpu_features
+   /* Currently, the GLIBC memory tagging tunable only defines 8 bits.  */
+   uint8_t mte_state;
+   bool sve;
++  bool prefer_sve_ifuncs;
++  bool mops;
+ };
+ 
+ #endif /* _CPU_FEATURES_AARCH64_H  */
 diff --git a/sysdeps/unix/sysv/linux/alpha/brk_call.h b/sysdeps/unix/sysv/linux/alpha/brk_call.h
 index b8088cf13f..0b851b6c86 100644
 --- a/sysdeps/unix/sysv/linux/alpha/brk_call.h
@@ -8685,6 +13394,18 @@ index 25bd6cb638..fb11a3fba4 100644
  
 -
  #endif /* _BITS_STRUCT_STAT_H  */
+diff --git a/sysdeps/unix/sysv/linux/bits/uio-ext.h b/sysdeps/unix/sysv/linux/bits/uio-ext.h
+index 5b0dba08c5..e49b66facd 100644
+--- a/sysdeps/unix/sysv/linux/bits/uio-ext.h
++++ b/sysdeps/unix/sysv/linux/bits/uio-ext.h
+@@ -47,6 +47,7 @@ extern ssize_t process_vm_writev (pid_t __pid, const struct iovec *__lvec,
+ #define RWF_SYNC	0x00000004 /* per-IO O_SYNC.  */
+ #define RWF_NOWAIT	0x00000008 /* per-IO nonblocking mode.  */
+ #define RWF_APPEND	0x00000010 /* per-IO O_APPEND.  */
++#define RWF_NOAPPEND	0x00000020 /* per-IO negation of O_APPEND */
+ 
+ __END_DECLS
+ 
 diff --git a/sysdeps/unix/sysv/linux/check_pf.c b/sysdeps/unix/sysv/linux/check_pf.c
 index fe73fe3ba8..ca20043408 100644
 --- a/sysdeps/unix/sysv/linux/check_pf.c
@@ -8917,6 +13638,76 @@ index 0000000000..f0ee455748
 +#define _STATBUF_ST_NSEC
 +
 +#endif /* _BITS_STRUCT_STAT_H  */
+diff --git a/sysdeps/unix/sysv/linux/dl-rseq-symbols.S b/sysdeps/unix/sysv/linux/dl-rseq-symbols.S
+new file mode 100644
+index 0000000000..b4bba06a99
+--- /dev/null
++++ b/sysdeps/unix/sysv/linux/dl-rseq-symbols.S
+@@ -0,0 +1,64 @@
++/* Define symbols used by rseq.
++   Copyright (C) 2024 Free Software Foundation, Inc.
++   This file is part of the GNU C Library.
++
++   The GNU C Library is free software; you can redistribute it and/or
++   modify it under the terms of the GNU Lesser General Public
++   License as published by the Free Software Foundation; either
++   version 2.1 of the License, or (at your option) any later version.
++
++   The GNU C Library is distributed in the hope that it will be useful,
++   but WITHOUT ANY WARRANTY; without even the implied warranty of
++   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
++   Lesser General Public License for more details.
++
++   You should have received a copy of the GNU Lesser General Public
++   License along with the GNU C Library; if not, see
++   <https://www.gnu.org/licenses/>.  */
++
++#include <sysdep.h>
++
++#if __WORDSIZE == 64
++#define RSEQ_OFFSET_SIZE	8
++#else
++#define RSEQ_OFFSET_SIZE	4
++#endif
++
++/* Some targets define a macro to denote the zero register.  */
++#undef zero
++
++/* Define 2 symbols: '__rseq_size' is public const and '_rseq_size' (an
++   alias of '__rseq_size') is hidden and writable for internal use by the
++   dynamic linker which will initialize the value both symbols point to
++   before copy relocations take place. */
++
++	.globl	__rseq_size
++	.type	__rseq_size, %object
++	.size	__rseq_size, 4
++	.hidden _rseq_size
++	.globl	_rseq_size
++	.type	_rseq_size, %object
++	.size	_rseq_size, 4
++	.section .data.rel.ro
++	.balign 4
++__rseq_size:
++_rseq_size:
++	.zero	4
++
++/* Define 2 symbols: '__rseq_offset' is public const and '_rseq_offset' (an
++   alias of '__rseq_offset') is hidden and writable for internal use by the
++   dynamic linker which will initialize the value both symbols point to
++   before copy relocations take place. */
++
++	.globl	__rseq_offset
++	.type	__rseq_offset, %object
++	.size	__rseq_offset, RSEQ_OFFSET_SIZE
++	.hidden _rseq_offset
++	.globl	_rseq_offset
++	.type	_rseq_offset, %object
++	.size	_rseq_offset, RSEQ_OFFSET_SIZE
++	.section .data.rel.ro
++	.balign RSEQ_OFFSET_SIZE
++__rseq_offset:
++_rseq_offset:
++	.zero	RSEQ_OFFSET_SIZE
 diff --git a/sysdeps/unix/sysv/linux/generic/bits/struct_stat.h b/sysdeps/unix/sysv/linux/generic/bits/struct_stat.h
 deleted file mode 100644
 index fb11a3fba4..0000000000
@@ -9050,6 +13841,19 @@ index fb11a3fba4..0000000000
 -#define _STATBUF_ST_NSEC
 -
 -#endif /* _BITS_STRUCT_STAT_H  */
+diff --git a/sysdeps/unix/sysv/linux/getsysstats.c b/sysdeps/unix/sysv/linux/getsysstats.c
+index 064eaa08ae..4d01786120 100644
+--- a/sysdeps/unix/sysv/linux/getsysstats.c
++++ b/sysdeps/unix/sysv/linux/getsysstats.c
+@@ -29,7 +29,7 @@
+ #include <sys/sysinfo.h>
+ #include <sysdep.h>
+ 
+-int
++static int
+ __get_nprocs_sched (void)
+ {
+   enum
 diff --git a/sysdeps/unix/sysv/linux/hppa/bits/struct_stat.h b/sysdeps/unix/sysv/linux/hppa/bits/struct_stat.h
 new file mode 100644
 index 0000000000..38b6e13e68
@@ -9195,6 +13999,33 @@ index 0000000000..38b6e13e68
 +
 +
 +#endif /* _BITS_STRUCT_STAT_H  */
+diff --git a/sysdeps/unix/sysv/linux/hppa/bits/wordsize.h b/sysdeps/unix/sysv/linux/hppa/bits/wordsize.h
+new file mode 100644
+index 0000000000..6ecbfe7c86
+--- /dev/null
++++ b/sysdeps/unix/sysv/linux/hppa/bits/wordsize.h
+@@ -0,0 +1,21 @@
++/* Copyright (C) 1999-2024 Free Software Foundation, Inc.
++   This file is part of the GNU C Library.
++
++   The GNU C Library is free software; you can redistribute it and/or
++   modify it under the terms of the GNU Lesser General Public
++   License as published by the Free Software Foundation; either
++   version 2.1 of the License, or (at your option) any later version.
++
++   The GNU C Library is distributed in the hope that it will be useful,
++   but WITHOUT ANY WARRANTY; without even the implied warranty of
++   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
++   Lesser General Public License for more details.
++
++   You should have received a copy of the GNU Lesser General Public
++   License along with the GNU C Library; if not, see
++   <https://www.gnu.org/licenses/>.  */
++
++#define __WORDSIZE			32
++#define __WORDSIZE_TIME64_COMPAT32	1
++#define __WORDSIZE32_SIZE_ULONG		0
++#define __WORDSIZE32_PTRDIFF_LONG	0
 diff --git a/sysdeps/unix/sysv/linux/hppa/kernel-features.h b/sysdeps/unix/sysv/linux/hppa/kernel-features.h
 index 0cd21ef0fa..079612e4aa 100644
 --- a/sysdeps/unix/sysv/linux/hppa/kernel-features.h
@@ -9563,6 +14394,22 @@ index d7cf158b33..0ca6e69ee9 100644
  struct flock
    {
      short int l_type;	/* Type of lock: F_RDLCK, F_WRLCK, or F_UNLCK.	*/
+diff --git a/sysdeps/unix/sysv/linux/powerpc/bits/wordsize.h b/sysdeps/unix/sysv/linux/powerpc/bits/wordsize.h
+index 04ca9debf0..6993fb6b29 100644
+--- a/sysdeps/unix/sysv/linux/powerpc/bits/wordsize.h
++++ b/sysdeps/unix/sysv/linux/powerpc/bits/wordsize.h
+@@ -2,10 +2,9 @@
+ 
+ #if defined __powerpc64__
+ # define __WORDSIZE	64
+-# define __WORDSIZE_TIME64_COMPAT32	1
+ #else
+ # define __WORDSIZE	32
+-# define __WORDSIZE_TIME64_COMPAT32	0
+ # define __WORDSIZE32_SIZE_ULONG	0
+ # define __WORDSIZE32_PTRDIFF_LONG	0
+ #endif
++#define __WORDSIZE_TIME64_COMPAT32	1
 diff --git a/sysdeps/unix/sysv/linux/riscv/rv32/arch-syscall.h b/sysdeps/unix/sysv/linux/riscv/rv32/arch-syscall.h
 index bf4be80f8d..202520ee25 100644
 --- a/sysdeps/unix/sysv/linux/riscv/rv32/arch-syscall.h
@@ -9587,6 +14434,69 @@ index d656aedcc2..4e65f337d4 100644
  #define __NR_migrate_pages 238
  #define __NR_mincore 232
  #define __NR_mkdirat 34
+diff --git a/sysdeps/unix/sysv/linux/rseq-internal.h b/sysdeps/unix/sysv/linux/rseq-internal.h
+index 210f3ec566..f08a70dfc4 100644
+--- a/sysdeps/unix/sysv/linux/rseq-internal.h
++++ b/sysdeps/unix/sysv/linux/rseq-internal.h
+@@ -25,15 +25,34 @@
+ #include <stdio.h>
+ #include <sys/rseq.h>
+ 
++/* 32 is the initially required value for the area size.  The
++   actually used rseq size may be less (20 bytes initially).  */
++#define RSEQ_AREA_SIZE_INITIAL 32
++#define RSEQ_AREA_SIZE_INITIAL_USED 20
++
++/* The variables are in .data.relro but are not yet write-protected.  */
++extern unsigned int _rseq_size attribute_hidden;
++extern ptrdiff_t _rseq_offset attribute_hidden;
++
+ #ifdef RSEQ_SIG
+ static inline bool
+ rseq_register_current_thread (struct pthread *self, bool do_rseq)
+ {
+   if (do_rseq)
+     {
++      unsigned int size;
++#if IS_IN (rtld)
++      /* Use the hidden symbol in ld.so.  */
++      size = _rseq_size;
++#else
++      size = __rseq_size;
++#endif
++      if (size < RSEQ_AREA_SIZE_INITIAL)
++        /* The initial implementation used only 20 bytes out of 32,
++           but still expected size 32.  */
++        size = RSEQ_AREA_SIZE_INITIAL;
+       int ret = INTERNAL_SYSCALL_CALL (rseq, &self->rseq_area,
+-                                       sizeof (self->rseq_area),
+-                                       0, RSEQ_SIG);
++                                       size, 0, RSEQ_SIG);
+       if (!INTERNAL_SYSCALL_ERROR_P (ret))
+         return true;
+     }
+diff --git a/sysdeps/unix/sysv/linux/sched_getcpu.c b/sysdeps/unix/sysv/linux/sched_getcpu.c
+index 5c3301004c..3a2f712386 100644
+--- a/sysdeps/unix/sysv/linux/sched_getcpu.c
++++ b/sysdeps/unix/sysv/linux/sched_getcpu.c
+@@ -33,17 +33,9 @@ vsyscall_sched_getcpu (void)
+   return r == -1 ? r : cpu;
+ }
+ 
+-#ifdef RSEQ_SIG
+ int
+ sched_getcpu (void)
+ {
+   int cpu_id = THREAD_GETMEM_VOLATILE (THREAD_SELF, rseq_area.cpu_id);
+   return __glibc_likely (cpu_id >= 0) ? cpu_id : vsyscall_sched_getcpu ();
+ }
+-#else /* RSEQ_SIG */
+-int
+-sched_getcpu (void)
+-{
+-  return vsyscall_sched_getcpu ();
+-}
+-#endif /* RSEQ_SIG */
 diff --git a/sysdeps/unix/sysv/linux/semctl.c b/sysdeps/unix/sysv/linux/semctl.c
 index 77a8130c18..3458b018bc 100644
 --- a/sysdeps/unix/sysv/linux/semctl.c
@@ -9833,6 +14743,63 @@ index ea38935497..f00817a6f6 100644
  }
  #if __TIMESIZE != 64
  libc_hidden_def (__shmctl64)
+diff --git a/sysdeps/unix/sysv/linux/sparc/bits/wordsize.h b/sysdeps/unix/sysv/linux/sparc/bits/wordsize.h
+index 7562875ee2..ea103e5970 100644
+--- a/sysdeps/unix/sysv/linux/sparc/bits/wordsize.h
++++ b/sysdeps/unix/sysv/linux/sparc/bits/wordsize.h
+@@ -2,10 +2,9 @@
+ 
+ #if defined __arch64__ || defined __sparcv9
+ # define __WORDSIZE	64
+-# define __WORDSIZE_TIME64_COMPAT32	1
+ #else
+ # define __WORDSIZE	32
+ # define __WORDSIZE32_SIZE_ULONG	0
+ # define __WORDSIZE32_PTRDIFF_LONG	0
+-# define __WORDSIZE_TIME64_COMPAT32	0
+ #endif
++#define __WORDSIZE_TIME64_COMPAT32	1
+diff --git a/sysdeps/unix/sysv/linux/sparc/sparc32/sigreturn_stub.S b/sysdeps/unix/sysv/linux/sparc/sparc32/sigreturn_stub.S
+index 2829e881eb..a1492ea59e 100644
+--- a/sysdeps/unix/sysv/linux/sparc/sparc32/sigreturn_stub.S
++++ b/sysdeps/unix/sysv/linux/sparc/sparc32/sigreturn_stub.S
+@@ -23,12 +23,15 @@
+ 
+    [1] https://lkml.org/lkml/2016/5/27/465  */
+ 
+-ENTRY (__rt_sigreturn_stub)
++	nop
++	nop
++
++ENTRY_NOCFI (__rt_sigreturn_stub)
+ 	mov	__NR_rt_sigreturn, %g1
+ 	ta	0x10
+-END (__rt_sigreturn_stub)
++END_NOCFI (__rt_sigreturn_stub)
+ 
+-ENTRY (__sigreturn_stub)
++ENTRY_NOCFI (__sigreturn_stub)
+ 	mov	__NR_sigreturn, %g1
+ 	ta	0x10
+-END (__sigreturn_stub)
++END_NOCFI (__sigreturn_stub)
+diff --git a/sysdeps/unix/sysv/linux/sparc/sparc64/sigreturn_stub.S b/sysdeps/unix/sysv/linux/sparc/sparc64/sigreturn_stub.S
+index ac6af95e36..23b8b93f56 100644
+--- a/sysdeps/unix/sysv/linux/sparc/sparc64/sigreturn_stub.S
++++ b/sysdeps/unix/sysv/linux/sparc/sparc64/sigreturn_stub.S
+@@ -23,7 +23,10 @@
+ 
+    [1] https://lkml.org/lkml/2016/5/27/465  */
+ 
+-ENTRY (__rt_sigreturn_stub)
++	nop
++	nop
++
++ENTRY_NOCFI (__rt_sigreturn_stub)
+ 	mov	__NR_rt_sigreturn, %g1
+ 	ta	0x6d
+-END (__rt_sigreturn_stub)
++END_NOCFI (__rt_sigreturn_stub)
 diff --git a/sysdeps/unix/sysv/linux/sys/mount.h b/sysdeps/unix/sysv/linux/sys/mount.h
 index f965986ba8..19841d0738 100644
 --- a/sysdeps/unix/sysv/linux/sys/mount.h
@@ -10197,6 +15164,70 @@ index 037af22290..5711d1c312 100644
      TEST_VERIFY (fd > 0);
  
      char *path = xasprintf ("/proc/%d/fd/%d", pid, remote_fd);
+diff --git a/sysdeps/unix/sysv/linux/tst-rseq-disable.c b/sysdeps/unix/sysv/linux/tst-rseq-disable.c
+index e1a2c02f78..a46b0d0562 100644
+--- a/sysdeps/unix/sysv/linux/tst-rseq-disable.c
++++ b/sysdeps/unix/sysv/linux/tst-rseq-disable.c
+@@ -22,6 +22,7 @@
+ #include <support/xthread.h>
+ #include <sysdep.h>
+ #include <thread_pointer.h>
++#include <sys/rseq.h>
+ #include <unistd.h>
+ 
+ #ifdef RSEQ_SIG
+diff --git a/sysdeps/unix/sysv/linux/tst-rseq.c b/sysdeps/unix/sysv/linux/tst-rseq.c
+index fa6a89541f..613593f7f9 100644
+--- a/sysdeps/unix/sysv/linux/tst-rseq.c
++++ b/sysdeps/unix/sysv/linux/tst-rseq.c
+@@ -29,6 +29,7 @@
+ # include <stdlib.h>
+ # include <string.h>
+ # include <syscall.h>
++# include <sys/auxv.h>
+ # include <thread_pointer.h>
+ # include <tls.h>
+ # include "tst-rseq.h"
+@@ -42,7 +43,8 @@ do_rseq_main_test (void)
+   TEST_COMPARE (__rseq_flags, 0);
+   TEST_VERIFY ((char *) __thread_pointer () + __rseq_offset
+                == (char *) &pd->rseq_area);
+-  TEST_COMPARE (__rseq_size, sizeof (pd->rseq_area));
++  /* The current implementation only supports the initial size.  */
++  TEST_COMPARE (__rseq_size, 20);
+ }
+ 
+ static void
+@@ -52,6 +54,12 @@ do_rseq_test (void)
+     {
+       FAIL_UNSUPPORTED ("kernel does not support rseq, skipping test");
+     }
++  printf ("info: __rseq_size: %u\n", __rseq_size);
++  printf ("info: __rseq_offset: %td\n", __rseq_offset);
++  printf ("info: __rseq_flags: %u\n", __rseq_flags);
++  printf ("info: getauxval (AT_RSEQ_FEATURE_SIZE): %ld\n",
++          getauxval (AT_RSEQ_FEATURE_SIZE));
++  printf ("info: getauxval (AT_RSEQ_ALIGN): %ld\n", getauxval (AT_RSEQ_ALIGN));
+   do_rseq_main_test ();
+ }
+ #else /* RSEQ_SIG */
+diff --git a/sysdeps/x86/bits/wordsize.h b/sysdeps/x86/bits/wordsize.h
+index 70f652bca1..3f40aa76f9 100644
+--- a/sysdeps/x86/bits/wordsize.h
++++ b/sysdeps/x86/bits/wordsize.h
+@@ -8,10 +8,9 @@
+ #define __WORDSIZE32_PTRDIFF_LONG	0
+ #endif
+ 
++#define __WORDSIZE_TIME64_COMPAT32 1
++
+ #ifdef __x86_64__
+-# define __WORDSIZE_TIME64_COMPAT32	1
+ /* Both x86-64 and x32 use the 64-bit system call interface.  */
+ # define __SYSCALL_WORDSIZE		64
+-#else
+-# define __WORDSIZE_TIME64_COMPAT32	0
+ #endif
 diff --git a/sysdeps/x86/dl-cacheinfo.h b/sysdeps/x86/dl-cacheinfo.h
 index e9f3382108..d95c1efa2c 100644
 --- a/sysdeps/x86/dl-cacheinfo.h
@@ -10467,6 +15498,14 @@ index 3c4480aba7..06f6c9663e 100644
  #define MOVBE_X86_ISA_LEVEL 3
  
  /* ISA level >= 2 guaranteed includes.  */
+diff --git a/sysdeps/x86/utmp-size.h b/sysdeps/x86/utmp-size.h
+new file mode 100644
+index 0000000000..8f21ebe1b6
+--- /dev/null
++++ b/sysdeps/x86/utmp-size.h
+@@ -0,0 +1,2 @@
++#define UTMP_SIZE 384
++#define LASTLOG_SIZE 292
 diff --git a/sysdeps/x86_64/dl-tlsdesc.S b/sysdeps/x86_64/dl-tlsdesc.S
 index 0db2cb4152..7619e743e1 100644
 --- a/sysdeps/x86_64/dl-tlsdesc.S
@@ -10498,6 +15537,29 @@ index 0db2cb4152..7619e743e1 100644
  	movq	-8(%rsp), %rdi
  	ret
  .Lslow:
+diff --git a/sysdeps/x86_64/ffsll.c b/sysdeps/x86_64/ffsll.c
+index 842ebaeb4c..d352866d9f 100644
+--- a/sysdeps/x86_64/ffsll.c
++++ b/sysdeps/x86_64/ffsll.c
+@@ -26,13 +26,13 @@ int
+ ffsll (long long int x)
+ {
+   long long int cnt;
+-  long long int tmp;
+ 
+-  asm ("bsfq %2,%0\n"		/* Count low bits in X and store in %1.  */
+-       "cmoveq %1,%0\n"		/* If number was zero, use -1 as result.  */
+-       : "=&r" (cnt), "=r" (tmp) : "rm" (x), "1" (-1));
++  asm ("mov $-1,%k0\n"	/* Initialize cnt to -1.  */
++       "bsf %1,%0\n"	/* Count low bits in x and store in cnt.  */
++       "inc %k0\n"	/* Increment cnt by 1.  */
++       : "=&r" (cnt) : "r" (x));
+ 
+-  return cnt + 1;
++  return cnt;
+ }
+ 
+ #ifndef __ILP32__
 diff --git a/sysdeps/x86_64/fpu/fraiseexcpt.c b/sysdeps/x86_64/fpu/fraiseexcpt.c
 index 864f4777a2..23446ff4ac 100644
 --- a/sysdeps/x86_64/fpu/fraiseexcpt.c
diff --git a/debian/patches/kfreebsd/submitted-auxv.diff b/debian/patches/kfreebsd/submitted-auxv.diff
index c2fc471d..81d4174d 100644
--- a/debian/patches/kfreebsd/submitted-auxv.diff
+++ b/debian/patches/kfreebsd/submitted-auxv.diff
@@ -36,7 +36,7 @@ https://sourceware.org/bugzilla/show_bug.cgi?id=15794
    for (p = GLRO(dl_auxv); p->a_type != AT_NULL; p++)
 --- /dev/null
 +++ b/bits/auxv.h
-@@ -0,0 +1,90 @@
+@@ -0,0 +1,93 @@
 +/* Copyright (C) 1995-2013 Free Software Foundation, Inc.
 +   This file is part of the GNU C Library.
 +
@@ -100,6 +100,9 @@ https://sourceware.org/bugzilla/show_bug.cgi?id=15794
 +#define AT_HWCAP2	26		/* More machine-dependent hints about
 +					   processor capabilities.  */
 +
++#define AT_RSEQ_FEATURE_SIZE	27	/* rseq supported feature size.  */
++#define AT_RSEQ_ALIGN	28		/* rseq allocation alignment.  */
++
 +#define AT_EXECFN	31		/* Filename of executable.  */
 +
 +/* Pointer to the global system page used for system calls and other
@@ -129,7 +132,7 @@ https://sourceware.org/bugzilla/show_bug.cgi?id=15794
 +#define AT_MINSIGSTKSZ		51 /* Stack needed for signal delivery  */
 --- a/elf/elf.h
 +++ b/elf/elf.h
-@@ -1154,80 +1154,7 @@
+@@ -1154,83 +1154,7 @@
      } a_un;
  } Elf64_auxv_t;
  
@@ -179,6 +182,9 @@ https://sourceware.org/bugzilla/show_bug.cgi?id=15794
 -#define AT_HWCAP2	26		/* More machine-dependent hints about
 -					   processor capabilities.  */
 -
+-#define AT_RSEQ_FEATURE_SIZE	27	/* rseq supported feature size.  */
+-#define AT_RSEQ_ALIGN	28		/* rseq allocation alignment.  */
+-
 -#define AT_EXECFN	31		/* Filename of executable.  */
 -
 -/* Pointer to the global system page used for system calls and other
diff --git a/debian/patches/series b/debian/patches/series
index 3701a83f..350fd9d3 100644
--- a/debian/patches/series
+++ b/debian/patches/series
@@ -120,7 +120,3 @@ any/local-cross.patch
 any/git-floatn-gcc-13-support.diff
 any/local-disable-tst-bz29951.diff
 any/local-qsort-memory-corruption.patch
-any/local-CVE-2024-2961-iso-2022-cn-ext.diff
-any/local-CVE-2024-33599-nscd.diff
-any/local-CVE-2024-33600-nscd.diff
-any/local-CVE-2024-33601-33602-nscd.diff

Reply to:

Prev by Date: Processing of glibc_2.39-6_source.changes
Next by Date: [Git][glibc-team/glibc][bullseye] 7 commits: debian/patches/any/local-CVE-2024-2961-iso-2022-cn-ext.patch: Fix out-of-bound...
Previous by thread: Processing of glibc_2.39-6_source.changes
Next by thread: [Git][glibc-team/glibc][bullseye] 7 commits: debian/patches/any/local-CVE-2024-2961-iso-2022-cn-ext.patch: Fix out-of-bound...
Index(es):
- Date
- Thread