[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#1055211: bookworm-pu: package gcc-12/12.2.0-14+deb12u1 (CVE-2023-4039)



Package: release.debian.org
Severity: normal
Tags: bookworm
User: release.debian.org@packages.debian.org
Usertags: pu
X-Debbugs-Cc: debian-gcc@lists.debian.org, jmm@debian.org, doko@debian.org
Control: affects -1 + src:gcc-12

[ Reason ]
A failure in the -fstack-protector feature in GCC-based toolchains that target
AArch64 allows an attacker to exploit an existing buffer overflow in
dynamically-sized local variables without this being detected.

The Security Team explicitly stated that they will not release a DSA for the
bug, see https://security-tracker.debian.org/tracker/CVE-2023-4039 

This issue has been fixed in version 12.3.0-9 for sid and trixie. It would now
be good to address the problem in bookworm too.

[ Impact ]
Without this change, arm64 users of gcc-12 in bookworm are not fully protected
by -fstack-protector as described in CVE-2023-4039.

[ Tests ]
In terms of manual testing, I have verified that the stack smashing attack
published at
https://github.com/metaredteam/external-disclosures/security/advisories/GHSA-x7ch-h5rf-w2mf
is detected and stopped when the PoC is compiled with gcc 12.2.0-14+deb12u1,
while it results in a Bus error with 12.2.0-14:

  $ gcc-12 --version | head -1
  gcc-12 (Debian 12.2.0-14+deb12u1) 12.2.0
  $ gcc-12 -fstack-protector-all -O3 -static -Wall -Wextra -pedantic -o poc poc.c
  $ echo -n 'DDDDDDDDPPPPPPPPFFFFFFFFAAAAAAAA' | ./poc 8
  *** stack smashing detected ***: terminated
  Aborted

  $ gcc-12 --version | head -1
  gcc-12 (Debian 12.2.0-14) 12.2.0
  $ gcc-12 -fstack-protector-all -O3 -static -Wall -Wextra -pedantic -o poc poc.c
  $ echo -n 'DDDDDDDDPPPPPPPPFFFFFFFFAAAAAAAA' | ./poc 8
  Bus error

As an additional smoke test I have also successfully built a few packages, as
well as a kernel.

In terms of automated testing, the upstream test suite is extensive.
Additionally, upstream added the following tests specifically for
CVE-2023-4039:

 testsuite/gcc.target/aarch64/stack-check-prologue-17.c
 testsuite/gcc.target/aarch64/stack-check-prologue-18.c
 testsuite/gcc.target/aarch64/stack-check-prologue-19.c
 testsuite/gcc.target/aarch64/stack-check-prologue-20.c
 testsuite/gcc.target/aarch64/stack-protector-8.c
 testsuite/gcc.target/aarch64/stack-protector-9.c

[ Risks ]
There are obviously potential risks of regressions associated with compiler
changes. However the specific changes proposed here have been part of gcc-12
upstream since September, and by now have been tested quite a bit. One
regression has been found early on and fixed:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111411

All changes are arm64-specific and don't affect any other architecture.

[ Checklist ]
  [x] *all* changes are documented in the d/changelog
  [x] I reviewed all changes and I approve them
  [x] attach debdiff against the package in (old)stable
  [x] the issue is verified as fixed in unstable

[ Changes ]
All changes merged upstream back in September 2023, see
https://gcc.gnu.org/git/?p=gcc.git;a=shortlog;h=refs/heads/releases/gcc-12

- aarch64: Avoid a use of callee_offset
  12a8889de169f892d2e927584c00d20b8b7e456f

- aarch64: Explicitly handle frames with no saved registers
  03d5e89e7f3be53fd7142556e8e0a2774c653dca

- aarch64: Add bytes_below_saved_regs to frame info
  49c2eb7616756c323b7f6b18d8616ec945eb1263

- aarch64: Add bytes_below_hard_fp to frame info
  34081079ea4de0c98331843f574b5f6f94d7b234

- aarch64: Tweak aarch64_save/restore_callee_saves
  187861af7c51db9eddc6f954b589c121b210fc74

- aarch64: Only calculate chain_offset if there is a chain
  2b983f9064d808daf909bde1d4a13980934a7e6e

- aarch64: Rename locals_offset to bytes_above_locals
  0a0a824808d1dec51004fb5805c1a0ae2a35433f

- aarch64: Rename hard_fp_offset to bytes_above_hard_fp
  3fbf0789202b30a67b12e1fb785c7130f098d665

- aarch64: Tweak frame_size comment
  aac8b31379ac3bbd14fc6427dce23f56e54e8485

- aarch64: Measure reg_offset from the bottom of the frame
  8d5506a8aeb8dd7e8b209a3663b07688478f76b9

- aarch64: Simplify top of frame allocation
  b47766614df3b9df878262efb2ad73aaac108363

- aarch64: Minor initial adjustment tweak
  08f71b4bb28fb74d20e8d2927a557e8119ce9f4d

- aarch64: Tweak stack clash boundary condition
  f22315d5c19e8310e4dc880fd509678fd291fca8

- aarch64: Put LR save probe in first 16 bytes
  15e18831bf98fd25af098b970ebf0c9a6200a34b

- aarch64: Simplify probe of final frame allocation
  c4f0e121faa36342f1d21919e54a05ad841c4f86

- aarch64: Explicitly record probe registers in frame info
  6f0ab0a9f46a17b68349ff6035aa776bf65f0575

- aarch64: Remove below_hard_fp_saved_regs_size
  8254e1b9cd500e0c278465a3657543477e9d1250

- aarch64: Make stack smash canary protect saved registers
  75c37e031408262263442f5b4cdb83d3777b6422

- aarch64: Fix return register handling in untyped_call
  38d0605ac8bc90324170041676fc05e7e595769e

- aarch64: Fix loose ldpstp check [PR111411]
  74f99f1adc696f446115f36974a3f94f66294a53
diff -Nru gcc-12-12.2.0/debian/changelog gcc-12-12.2.0/debian/changelog
--- gcc-12-12.2.0/debian/changelog	2023-01-08 10:12:42.000000000 +0100
+++ gcc-12-12.2.0/debian/changelog	2023-10-09 11:45:48.000000000 +0200
@@ -1,3 +1,9 @@
+gcc-12 (12.2.0-14+deb12u1) bookworm; urgency=medium
+
+  * Fix -fstack-protector handling of overflows on AArch64 (CVE-2023-4039).
+
+ -- Emanuele Rocca <ema@debian.org>  Mon, 09 Oct 2023 11:45:48 +0200
+
 gcc-12 (12.2.0-14) unstable; urgency=medium
 
   * Update to git 20230108 from the gcc-12 branch.
diff -Nru gcc-12-12.2.0/debian/patches/CVE-2023-4039.diff gcc-12-12.2.0/debian/patches/CVE-2023-4039.diff
--- gcc-12-12.2.0/debian/patches/CVE-2023-4039.diff	1970-01-01 01:00:00.000000000 +0100
+++ gcc-12-12.2.0/debian/patches/CVE-2023-4039.diff	2023-09-21 10:08:42.000000000 +0200
@@ -0,0 +1,1665 @@
+DP: Fix CVE-2023-4039.
+DP: 
+DP: The patch includes the following upstream commits:
+DP: 
+DP: aarch64: Use local frame vars in shrink-wrapping code
+DP: 62fbb215cc817e9f2c1ca80282a64f4ee30806bc
+DP: 
+DP: aarch64: Avoid a use of callee_offset
+DP: 12a8889de169f892d2e927584c00d20b8b7e456f
+DP: 
+DP: aarch64: Explicitly handle frames with no saved registers
+DP: 03d5e89e7f3be53fd7142556e8e0a2774c653dca
+DP: 
+DP: aarch64: Add bytes_below_saved_regs to frame info
+DP: 49c2eb7616756c323b7f6b18d8616ec945eb1263
+DP: 
+DP: aarch64: Add bytes_below_hard_fp to frame info
+DP: 34081079ea4de0c98331843f574b5f6f94d7b234
+DP: 
+DP: aarch64: Tweak aarch64_save/restore_callee_saves
+DP: 187861af7c51db9eddc6f954b589c121b210fc74
+DP: 
+DP: aarch64: Only calculate chain_offset if there is a chain
+DP: 2b983f9064d808daf909bde1d4a13980934a7e6e
+DP: 
+DP: aarch64: Rename locals_offset to bytes_above_locals
+DP: 0a0a824808d1dec51004fb5805c1a0ae2a35433f
+DP: 
+DP: aarch64: Rename hard_fp_offset to bytes_above_hard_fp
+DP: 3fbf0789202b30a67b12e1fb785c7130f098d665
+DP: 
+DP: aarch64: Tweak frame_size comment
+DP: aac8b31379ac3bbd14fc6427dce23f56e54e8485
+DP: 
+DP: aarch64: Measure reg_offset from the bottom of the frame
+DP: 8d5506a8aeb8dd7e8b209a3663b07688478f76b9
+DP: 
+DP: aarch64: Simplify top of frame allocation
+DP: b47766614df3b9df878262efb2ad73aaac108363
+DP: 
+DP: aarch64: Minor initial adjustment tweak
+DP: 08f71b4bb28fb74d20e8d2927a557e8119ce9f4d
+DP: 
+DP: aarch64: Tweak stack clash boundary condition
+DP: f22315d5c19e8310e4dc880fd509678fd291fca8
+DP: 
+DP: aarch64: Put LR save probe in first 16 bytes
+DP: 15e18831bf98fd25af098b970ebf0c9a6200a34b
+DP: 
+DP: aarch64: Simplify probe of final frame allocation
+DP: c4f0e121faa36342f1d21919e54a05ad841c4f86
+DP: 
+DP: aarch64: Explicitly record probe registers in frame info
+DP: 6f0ab0a9f46a17b68349ff6035aa776bf65f0575
+DP: 
+DP: aarch64: Remove below_hard_fp_saved_regs_size
+DP: 8254e1b9cd500e0c278465a3657543477e9d1250
+DP: 
+DP: aarch64: Make stack smash canary protect saved registers
+DP: 75c37e031408262263442f5b4cdb83d3777b6422
+DP: 
+DP: aarch64: Fix return register handling in untyped_call
+DP: 38d0605ac8bc90324170041676fc05e7e595769e
+DP: 
+DP: aarch64: Fix loose ldpstp check [PR111411]
+DP: 74f99f1adc696f446115f36974a3f94f66294a53
+
+Index: gcc-12-12.2.0/src/gcc/config/aarch64/aarch64.cc
+===================================================================
+--- gcc-12-12.2.0.orig/src/gcc/config/aarch64/aarch64.cc
++++ gcc-12-12.2.0/src/gcc/config/aarch64/aarch64.cc
+@@ -8070,18 +8070,32 @@ aarch64_needs_frame_chain (void)
+   return aarch64_use_frame_pointer;
+ }
+ 
++/* Return true if the current function should save registers above
++   the locals area, rather than below it.  */
++
++static bool
++aarch64_save_regs_above_locals_p ()
++{
++  /* When using stack smash protection, make sure that the canary slot
++     comes between the locals and the saved registers.  Otherwise,
++     it would be possible for a carefully sized smash attack to change
++     the saved registers (particularly LR and FP) without reaching the
++     canary.  */
++  return crtl->stack_protect_guard;
++}
++
+ /* Mark the registers that need to be saved by the callee and calculate
+    the size of the callee-saved registers area and frame record (both FP
+    and LR may be omitted).  */
+ static void
+ aarch64_layout_frame (void)
+ {
+-  poly_int64 offset = 0;
+   int regno, last_fp_reg = INVALID_REGNUM;
+   machine_mode vector_save_mode = aarch64_reg_save_mode (V8_REGNUM);
+   poly_int64 vector_save_size = GET_MODE_SIZE (vector_save_mode);
+   bool frame_related_fp_reg_p = false;
+   aarch64_frame &frame = cfun->machine->frame;
++  poly_int64 top_of_locals = -1;
+ 
+   frame.emit_frame_chain = aarch64_needs_frame_chain ();
+ 
+@@ -8148,11 +8162,18 @@ aarch64_layout_frame (void)
+ 	&& !crtl->abi->clobbers_full_reg_p (regno))
+       frame.reg_offset[regno] = SLOT_REQUIRED;
+ 
+-  /* With stack-clash, LR must be saved in non-leaf functions.  The saving of
+-     LR counts as an implicit probe which allows us to maintain the invariant
+-     described in the comment at expand_prologue.  */
+-  gcc_assert (crtl->is_leaf
+-	      || maybe_ne (frame.reg_offset[R30_REGNUM], SLOT_NOT_REQUIRED));
++  bool regs_at_top_p = aarch64_save_regs_above_locals_p ();
++
++  poly_int64 offset = crtl->outgoing_args_size;
++  gcc_assert (multiple_p (offset, STACK_BOUNDARY / BITS_PER_UNIT));
++  if (regs_at_top_p)
++    {
++      offset += get_frame_size ();
++      offset = aligned_upper_bound (offset, STACK_BOUNDARY / BITS_PER_UNIT);
++      top_of_locals = offset;
++    }
++  frame.bytes_below_saved_regs = offset;
++  frame.sve_save_and_probe = INVALID_REGNUM;
+ 
+   /* Now assign stack slots for the registers.  Start with the predicate
+      registers, since predicate LDR and STR have a relatively small
+@@ -8160,11 +8181,14 @@ aarch64_layout_frame (void)
+   for (regno = P0_REGNUM; regno <= P15_REGNUM; regno++)
+     if (known_eq (frame.reg_offset[regno], SLOT_REQUIRED))
+       {
++	if (frame.sve_save_and_probe == INVALID_REGNUM)
++	  frame.sve_save_and_probe = regno;
+ 	frame.reg_offset[regno] = offset;
+ 	offset += BYTES_PER_SVE_PRED;
+       }
+ 
+-  if (maybe_ne (offset, 0))
++  poly_int64 saved_prs_size = offset - frame.bytes_below_saved_regs;
++  if (maybe_ne (saved_prs_size, 0))
+     {
+       /* If we have any vector registers to save above the predicate registers,
+ 	 the offset of the vector register save slots need to be a multiple
+@@ -8182,10 +8206,10 @@ aarch64_layout_frame (void)
+ 	offset = aligned_upper_bound (offset, STACK_BOUNDARY / BITS_PER_UNIT);
+       else
+ 	{
+-	  if (known_le (offset, vector_save_size))
+-	    offset = vector_save_size;
+-	  else if (known_le (offset, vector_save_size * 2))
+-	    offset = vector_save_size * 2;
++	  if (known_le (saved_prs_size, vector_save_size))
++	    offset = frame.bytes_below_saved_regs + vector_save_size;
++	  else if (known_le (saved_prs_size, vector_save_size * 2))
++	    offset = frame.bytes_below_saved_regs + vector_save_size * 2;
+ 	  else
+ 	    gcc_unreachable ();
+ 	}
+@@ -8196,34 +8220,53 @@ aarch64_layout_frame (void)
+     for (regno = V0_REGNUM; regno <= V31_REGNUM; regno++)
+       if (known_eq (frame.reg_offset[regno], SLOT_REQUIRED))
+ 	{
++	  if (frame.sve_save_and_probe == INVALID_REGNUM)
++	    frame.sve_save_and_probe = regno;
+ 	  frame.reg_offset[regno] = offset;
+ 	  offset += vector_save_size;
+ 	}
+ 
+   /* OFFSET is now the offset of the hard frame pointer from the bottom
+      of the callee save area.  */
+-  bool saves_below_hard_fp_p = maybe_ne (offset, 0);
+-  frame.below_hard_fp_saved_regs_size = offset;
++  auto below_hard_fp_saved_regs_size = offset - frame.bytes_below_saved_regs;
++  bool saves_below_hard_fp_p = maybe_ne (below_hard_fp_saved_regs_size, 0);
++  gcc_assert (!saves_below_hard_fp_p
++	      || (frame.sve_save_and_probe != INVALID_REGNUM
++		  && known_eq (frame.reg_offset[frame.sve_save_and_probe],
++			       frame.bytes_below_saved_regs)));
++
++  frame.bytes_below_hard_fp = offset;
++  frame.hard_fp_save_and_probe = INVALID_REGNUM;
++
++  auto allocate_gpr_slot = [&](unsigned int regno)
++    {
++      if (frame.hard_fp_save_and_probe == INVALID_REGNUM)
++	frame.hard_fp_save_and_probe = regno;
++      frame.reg_offset[regno] = offset;
++      if (frame.wb_push_candidate1 == INVALID_REGNUM)
++	frame.wb_push_candidate1 = regno;
++      else if (frame.wb_push_candidate2 == INVALID_REGNUM)
++	frame.wb_push_candidate2 = regno;
++      offset += UNITS_PER_WORD;
++    };
++
+   if (frame.emit_frame_chain)
+     {
+       /* FP and LR are placed in the linkage record.  */
+-      frame.reg_offset[R29_REGNUM] = offset;
+-      frame.wb_push_candidate1 = R29_REGNUM;
+-      frame.reg_offset[R30_REGNUM] = offset + UNITS_PER_WORD;
+-      frame.wb_push_candidate2 = R30_REGNUM;
+-      offset += 2 * UNITS_PER_WORD;
++      allocate_gpr_slot (R29_REGNUM);
++      allocate_gpr_slot (R30_REGNUM);
+     }
++  else if (flag_stack_clash_protection
++	   && known_eq (frame.reg_offset[R30_REGNUM], SLOT_REQUIRED))
++    /* Put the LR save slot first, since it makes a good choice of probe
++       for stack clash purposes.  The idea is that the link register usually
++       has to be saved before a call anyway, and so we lose little by
++       stopping it from being individually shrink-wrapped.  */
++    allocate_gpr_slot (R30_REGNUM);
+ 
+   for (regno = R0_REGNUM; regno <= R30_REGNUM; regno++)
+     if (known_eq (frame.reg_offset[regno], SLOT_REQUIRED))
+-      {
+-	frame.reg_offset[regno] = offset;
+-	if (frame.wb_push_candidate1 == INVALID_REGNUM)
+-	  frame.wb_push_candidate1 = regno;
+-	else if (frame.wb_push_candidate2 == INVALID_REGNUM)
+-	  frame.wb_push_candidate2 = regno;
+-	offset += UNITS_PER_WORD;
+-      }
++      allocate_gpr_slot (regno);
+ 
+   poly_int64 max_int_offset = offset;
+   offset = aligned_upper_bound (offset, STACK_BOUNDARY / BITS_PER_UNIT);
+@@ -8232,6 +8275,8 @@ aarch64_layout_frame (void)
+   for (regno = V0_REGNUM; regno <= V31_REGNUM; regno++)
+     if (known_eq (frame.reg_offset[regno], SLOT_REQUIRED))
+       {
++	if (frame.hard_fp_save_and_probe == INVALID_REGNUM)
++	  frame.hard_fp_save_and_probe = regno;
+ 	/* If there is an alignment gap between integer and fp callee-saves,
+ 	   allocate the last fp register to it if possible.  */
+ 	if (regno == last_fp_reg
+@@ -8254,30 +8299,36 @@ aarch64_layout_frame (void)
+ 
+   offset = aligned_upper_bound (offset, STACK_BOUNDARY / BITS_PER_UNIT);
+ 
+-  frame.saved_regs_size = offset;
+-
+-  poly_int64 varargs_and_saved_regs_size = offset + frame.saved_varargs_size;
+-
+-  poly_int64 above_outgoing_args
+-    = aligned_upper_bound (varargs_and_saved_regs_size
+-			   + get_frame_size (),
+-			   STACK_BOUNDARY / BITS_PER_UNIT);
+-
+-  frame.hard_fp_offset
+-    = above_outgoing_args - frame.below_hard_fp_saved_regs_size;
+-
+-  /* Both these values are already aligned.  */
+-  gcc_assert (multiple_p (crtl->outgoing_args_size,
+-			  STACK_BOUNDARY / BITS_PER_UNIT));
+-  frame.frame_size = above_outgoing_args + crtl->outgoing_args_size;
+-
+-  frame.locals_offset = frame.saved_varargs_size;
++  auto saved_regs_size = offset - frame.bytes_below_saved_regs;
++  gcc_assert (known_eq (saved_regs_size, below_hard_fp_saved_regs_size)
++	      || (frame.hard_fp_save_and_probe != INVALID_REGNUM
++		  && known_eq (frame.reg_offset[frame.hard_fp_save_and_probe],
++			       frame.bytes_below_hard_fp)));
++
++  /* With stack-clash, a register must be saved in non-leaf functions.
++     The saving of the bottommost register counts as an implicit probe,
++     which allows us to maintain the invariant described in the comment
++     at expand_prologue.  */
++  gcc_assert (crtl->is_leaf || maybe_ne (saved_regs_size, 0));
++
++  if (!regs_at_top_p)
++    {
++      offset += get_frame_size ();
++      offset = aligned_upper_bound (offset, STACK_BOUNDARY / BITS_PER_UNIT);
++      top_of_locals = offset;
++    }
++  offset += frame.saved_varargs_size;
++  gcc_assert (multiple_p (offset, STACK_BOUNDARY / BITS_PER_UNIT));
++  frame.frame_size = offset;
++
++  frame.bytes_above_hard_fp = frame.frame_size - frame.bytes_below_hard_fp;
++  gcc_assert (known_ge (top_of_locals, 0));
++  frame.bytes_above_locals = frame.frame_size - top_of_locals;
+ 
+   frame.initial_adjust = 0;
+   frame.final_adjust = 0;
+   frame.callee_adjust = 0;
+   frame.sve_callee_adjust = 0;
+-  frame.callee_offset = 0;
+ 
+   frame.wb_pop_candidate1 = frame.wb_push_candidate1;
+   frame.wb_pop_candidate2 = frame.wb_push_candidate2;
+@@ -8288,7 +8339,7 @@ aarch64_layout_frame (void)
+   frame.is_scs_enabled
+     = (!crtl->calls_eh_return
+        && sanitize_flags_p (SANITIZE_SHADOW_CALL_STACK)
+-       && known_ge (cfun->machine->frame.reg_offset[LR_REGNUM], 0));
++       && known_ge (frame.reg_offset[LR_REGNUM], 0));
+ 
+   /* When shadow call stack is enabled, the scs_pop in the epilogue will
+      restore x30, and we don't need to pop x30 again in the traditional
+@@ -8308,75 +8359,76 @@ aarch64_layout_frame (void)
+      max_push_offset to 0, because no registers are popped at this time,
+      so callee_adjust cannot be adjusted.  */
+   HOST_WIDE_INT max_push_offset = 0;
+-  if (frame.wb_pop_candidate2 != INVALID_REGNUM)
+-    max_push_offset = 512;
+-  else if (frame.wb_pop_candidate1 != INVALID_REGNUM)
+-    max_push_offset = 256;
++  if (frame.wb_pop_candidate1 != INVALID_REGNUM)
++    {
++      if (frame.wb_pop_candidate2 != INVALID_REGNUM)
++	max_push_offset = 512;
++      else
++	max_push_offset = 256;
++    }
+ 
+-  HOST_WIDE_INT const_size, const_outgoing_args_size, const_fp_offset;
++  HOST_WIDE_INT const_size, const_below_saved_regs, const_above_fp;
+   HOST_WIDE_INT const_saved_regs_size;
+-  if (frame.frame_size.is_constant (&const_size)
+-      && const_size < max_push_offset
+-      && known_eq (frame.hard_fp_offset, const_size))
++  if (known_eq (saved_regs_size, 0))
++    frame.initial_adjust = frame.frame_size;
++  else if (frame.frame_size.is_constant (&const_size)
++	   && const_size < max_push_offset
++	   && known_eq (frame.bytes_above_hard_fp, const_size))
+     {
+-      /* Simple, small frame with no outgoing arguments:
++      /* Simple, small frame with no data below the saved registers.
+ 
+ 	 stp reg1, reg2, [sp, -frame_size]!
+ 	 stp reg3, reg4, [sp, 16]  */
+       frame.callee_adjust = const_size;
+     }
+-  else if (crtl->outgoing_args_size.is_constant (&const_outgoing_args_size)
+-	   && frame.saved_regs_size.is_constant (&const_saved_regs_size)
+-	   && const_outgoing_args_size + const_saved_regs_size < 512
+-	   /* We could handle this case even with outgoing args, provided
+-	      that the number of args left us with valid offsets for all
+-	      predicate and vector save slots.  It's such a rare case that
+-	      it hardly seems worth the effort though.  */
+-	   && (!saves_below_hard_fp_p || const_outgoing_args_size == 0)
++  else if (frame.bytes_below_saved_regs.is_constant (&const_below_saved_regs)
++	   && saved_regs_size.is_constant (&const_saved_regs_size)
++	   && const_below_saved_regs + const_saved_regs_size < 512
++	   /* We could handle this case even with data below the saved
++	      registers, provided that that data left us with valid offsets
++	      for all predicate and vector save slots.  It's such a rare
++	      case that it hardly seems worth the effort though.  */
++	   && (!saves_below_hard_fp_p || const_below_saved_regs == 0)
+ 	   && !(cfun->calls_alloca
+-		&& frame.hard_fp_offset.is_constant (&const_fp_offset)
+-		&& const_fp_offset < max_push_offset))
++		&& frame.bytes_above_hard_fp.is_constant (&const_above_fp)
++		&& const_above_fp < max_push_offset))
+     {
+-      /* Frame with small outgoing arguments:
++      /* Frame with small area below the saved registers:
+ 
+ 	 sub sp, sp, frame_size
+-	 stp reg1, reg2, [sp, outgoing_args_size]
+-	 stp reg3, reg4, [sp, outgoing_args_size + 16]  */
++	 stp reg1, reg2, [sp, bytes_below_saved_regs]
++	 stp reg3, reg4, [sp, bytes_below_saved_regs + 16]  */
+       frame.initial_adjust = frame.frame_size;
+-      frame.callee_offset = const_outgoing_args_size;
+     }
+   else if (saves_below_hard_fp_p
+-	   && known_eq (frame.saved_regs_size,
+-			frame.below_hard_fp_saved_regs_size))
++	   && known_eq (saved_regs_size, below_hard_fp_saved_regs_size))
+     {
+       /* Frame in which all saves are SVE saves:
+ 
+-	 sub sp, sp, hard_fp_offset + below_hard_fp_saved_regs_size
++	 sub sp, sp, frame_size - bytes_below_saved_regs
+ 	 save SVE registers relative to SP
+-	 sub sp, sp, outgoing_args_size  */
+-      frame.initial_adjust = (frame.hard_fp_offset
+-			      + frame.below_hard_fp_saved_regs_size);
+-      frame.final_adjust = crtl->outgoing_args_size;
++	 sub sp, sp, bytes_below_saved_regs  */
++      frame.initial_adjust = frame.frame_size - frame.bytes_below_saved_regs;
++      frame.final_adjust = frame.bytes_below_saved_regs;
+     }
+-  else if (frame.hard_fp_offset.is_constant (&const_fp_offset)
+-	   && const_fp_offset < max_push_offset)
++  else if (frame.bytes_above_hard_fp.is_constant (&const_above_fp)
++	   && const_above_fp < max_push_offset)
+     {
+-      /* Frame with large outgoing arguments or SVE saves, but with
+-	 a small local area:
++      /* Frame with large area below the saved registers, or with SVE saves,
++	 but with a small area above:
+ 
+ 	 stp reg1, reg2, [sp, -hard_fp_offset]!
+ 	 stp reg3, reg4, [sp, 16]
+ 	 [sub sp, sp, below_hard_fp_saved_regs_size]
+ 	 [save SVE registers relative to SP]
+-	 sub sp, sp, outgoing_args_size  */
+-      frame.callee_adjust = const_fp_offset;
+-      frame.sve_callee_adjust = frame.below_hard_fp_saved_regs_size;
+-      frame.final_adjust = crtl->outgoing_args_size;
++	 sub sp, sp, bytes_below_saved_regs  */
++      frame.callee_adjust = const_above_fp;
++      frame.sve_callee_adjust = below_hard_fp_saved_regs_size;
++      frame.final_adjust = frame.bytes_below_saved_regs;
+     }
+   else
+     {
+-      /* Frame with large local area and outgoing arguments or SVE saves,
+-	 using frame pointer:
++      /* General case:
+ 
+ 	 sub sp, sp, hard_fp_offset
+ 	 stp x29, x30, [sp, 0]
+@@ -8384,10 +8436,29 @@ aarch64_layout_frame (void)
+ 	 stp reg3, reg4, [sp, 16]
+ 	 [sub sp, sp, below_hard_fp_saved_regs_size]
+ 	 [save SVE registers relative to SP]
+-	 sub sp, sp, outgoing_args_size  */
+-      frame.initial_adjust = frame.hard_fp_offset;
+-      frame.sve_callee_adjust = frame.below_hard_fp_saved_regs_size;
+-      frame.final_adjust = crtl->outgoing_args_size;
++	 sub sp, sp, bytes_below_saved_regs  */
++      frame.initial_adjust = frame.bytes_above_hard_fp;
++      frame.sve_callee_adjust = below_hard_fp_saved_regs_size;
++      frame.final_adjust = frame.bytes_below_saved_regs;
++    }
++
++  /* The frame is allocated in pieces, with each non-final piece
++     including a register save at offset 0 that acts as a probe for
++     the following piece.  In addition, the save of the bottommost register
++     acts as a probe for callees and allocas.  Roll back any probes that
++     aren't needed.
++
++     A probe isn't needed if it is associated with the final allocation
++     (including callees and allocas) that happens before the epilogue is
++     executed.  */
++  if (crtl->is_leaf
++      && !cfun->calls_alloca
++      && known_eq (frame.final_adjust, 0))
++    {
++      if (maybe_ne (frame.sve_callee_adjust, 0))
++	frame.sve_save_and_probe = INVALID_REGNUM;
++      else
++	frame.hard_fp_save_and_probe = INVALID_REGNUM;
+     }
+ 
+   /* Make sure the individual adjustments add up to the full frame size.  */
+@@ -8691,15 +8762,17 @@ aarch64_add_cfa_expression (rtx_insn *in
+ }
+ 
+ /* Emit code to save the callee-saved registers from register number START
+-   to LIMIT to the stack at the location starting at offset START_OFFSET,
+-   skipping any write-back candidates if SKIP_WB is true.  HARD_FP_VALID_P
+-   is true if the hard frame pointer has been set up.  */
++   to LIMIT to the stack.  The stack pointer is currently BYTES_BELOW_SP
++   bytes above the bottom of the static frame.  Skip any write-back
++   candidates if SKIP_WB is true.  HARD_FP_VALID_P is true if the hard
++   frame pointer has been set up.  */
+ 
+ static void
+-aarch64_save_callee_saves (poly_int64 start_offset,
++aarch64_save_callee_saves (poly_int64 bytes_below_sp,
+ 			   unsigned start, unsigned limit, bool skip_wb,
+ 			   bool hard_fp_valid_p)
+ {
++  aarch64_frame &frame = cfun->machine->frame;
+   rtx_insn *insn;
+   unsigned regno;
+   unsigned regno2;
+@@ -8714,8 +8787,8 @@ aarch64_save_callee_saves (poly_int64 st
+       bool frame_related_p = aarch64_emit_cfi_for_reg_p (regno);
+ 
+       if (skip_wb
+-	  && (regno == cfun->machine->frame.wb_push_candidate1
+-	      || regno == cfun->machine->frame.wb_push_candidate2))
++	  && (regno == frame.wb_push_candidate1
++	      || regno == frame.wb_push_candidate2))
+ 	continue;
+ 
+       if (cfun->machine->reg_is_wrapped_separately[regno])
+@@ -8723,7 +8796,7 @@ aarch64_save_callee_saves (poly_int64 st
+ 
+       machine_mode mode = aarch64_reg_save_mode (regno);
+       reg = gen_rtx_REG (mode, regno);
+-      offset = start_offset + cfun->machine->frame.reg_offset[regno];
++      offset = frame.reg_offset[regno] - bytes_below_sp;
+       rtx base_rtx = stack_pointer_rtx;
+       poly_int64 sp_offset = offset;
+ 
+@@ -8734,9 +8807,7 @@ aarch64_save_callee_saves (poly_int64 st
+       else if (GP_REGNUM_P (regno)
+ 	       && (!offset.is_constant (&const_offset) || const_offset >= 512))
+ 	{
+-	  gcc_assert (known_eq (start_offset, 0));
+-	  poly_int64 fp_offset
+-	    = cfun->machine->frame.below_hard_fp_saved_regs_size;
++	  poly_int64 fp_offset = frame.bytes_below_hard_fp - bytes_below_sp;
+ 	  if (hard_fp_valid_p)
+ 	    base_rtx = hard_frame_pointer_rtx;
+ 	  else
+@@ -8758,8 +8829,7 @@ aarch64_save_callee_saves (poly_int64 st
+ 	  && (regno2 = aarch64_next_callee_save (regno + 1, limit)) <= limit
+ 	  && !cfun->machine->reg_is_wrapped_separately[regno2]
+ 	  && known_eq (GET_MODE_SIZE (mode),
+-		       cfun->machine->frame.reg_offset[regno2]
+-		       - cfun->machine->frame.reg_offset[regno]))
++		       frame.reg_offset[regno2] - frame.reg_offset[regno]))
+ 	{
+ 	  rtx reg2 = gen_rtx_REG (mode, regno2);
+ 	  rtx mem2;
+@@ -8801,14 +8871,16 @@ aarch64_save_callee_saves (poly_int64 st
+ }
+ 
+ /* Emit code to restore the callee registers from register number START
+-   up to and including LIMIT.  Restore from the stack offset START_OFFSET,
+-   skipping any write-back candidates if SKIP_WB is true.  Write the
+-   appropriate REG_CFA_RESTORE notes into CFI_OPS.  */
++   up to and including LIMIT.  The stack pointer is currently BYTES_BELOW_SP
++   bytes above the bottom of the static frame.  Skip any write-back
++   candidates if SKIP_WB is true.  Write the appropriate REG_CFA_RESTORE
++   notes into CFI_OPS.  */
+ 
+ static void
+-aarch64_restore_callee_saves (poly_int64 start_offset, unsigned start,
++aarch64_restore_callee_saves (poly_int64 bytes_below_sp, unsigned start,
+ 			      unsigned limit, bool skip_wb, rtx *cfi_ops)
+ {
++  aarch64_frame &frame = cfun->machine->frame;
+   unsigned regno;
+   unsigned regno2;
+   poly_int64 offset;
+@@ -8825,13 +8897,13 @@ aarch64_restore_callee_saves (poly_int64
+       rtx reg, mem;
+ 
+       if (skip_wb
+-	  && (regno == cfun->machine->frame.wb_pop_candidate1
+-	      || regno == cfun->machine->frame.wb_pop_candidate2))
++	  && (regno == frame.wb_pop_candidate1
++	      || regno == frame.wb_pop_candidate2))
+ 	continue;
+ 
+       machine_mode mode = aarch64_reg_save_mode (regno);
+       reg = gen_rtx_REG (mode, regno);
+-      offset = start_offset + cfun->machine->frame.reg_offset[regno];
++      offset = frame.reg_offset[regno] - bytes_below_sp;
+       rtx base_rtx = stack_pointer_rtx;
+       if (mode == VNx2DImode && BYTES_BIG_ENDIAN)
+ 	aarch64_adjust_sve_callee_save_base (mode, base_rtx, anchor_reg,
+@@ -8842,8 +8914,7 @@ aarch64_restore_callee_saves (poly_int64
+ 	  && (regno2 = aarch64_next_callee_save (regno + 1, limit)) <= limit
+ 	  && !cfun->machine->reg_is_wrapped_separately[regno2]
+ 	  && known_eq (GET_MODE_SIZE (mode),
+-		       cfun->machine->frame.reg_offset[regno2]
+-		       - cfun->machine->frame.reg_offset[regno]))
++		       frame.reg_offset[regno2] - frame.reg_offset[regno]))
+ 	{
+ 	  rtx reg2 = gen_rtx_REG (mode, regno2);
+ 	  rtx mem2;
+@@ -8948,6 +9019,7 @@ offset_12bit_unsigned_scaled_p (machine_
+ static sbitmap
+ aarch64_get_separate_components (void)
+ {
++  aarch64_frame &frame = cfun->machine->frame;
+   sbitmap components = sbitmap_alloc (LAST_SAVED_REGNUM + 1);
+   bitmap_clear (components);
+ 
+@@ -8964,20 +9036,11 @@ aarch64_get_separate_components (void)
+ 	if (mode == VNx2DImode && BYTES_BIG_ENDIAN)
+ 	  continue;
+ 
+-	poly_int64 offset = cfun->machine->frame.reg_offset[regno];
+-
+-	/* If the register is saved in the first SVE save slot, we use
+-	   it as a stack probe for -fstack-clash-protection.  */
+-	if (flag_stack_clash_protection
+-	    && maybe_ne (cfun->machine->frame.below_hard_fp_saved_regs_size, 0)
+-	    && known_eq (offset, 0))
+-	  continue;
++	poly_int64 offset = frame.reg_offset[regno];
+ 
+ 	/* Get the offset relative to the register we'll use.  */
+ 	if (frame_pointer_needed)
+-	  offset -= cfun->machine->frame.below_hard_fp_saved_regs_size;
+-	else
+-	  offset += crtl->outgoing_args_size;
++	  offset -= frame.bytes_below_hard_fp;
+ 
+ 	/* Check that we can access the stack slot of the register with one
+ 	   direct load with no adjustments needed.  */
+@@ -8994,11 +9057,11 @@ aarch64_get_separate_components (void)
+   /* If the spare predicate register used by big-endian SVE code
+      is call-preserved, it must be saved in the main prologue
+      before any saves that use it.  */
+-  if (cfun->machine->frame.spare_pred_reg != INVALID_REGNUM)
+-    bitmap_clear_bit (components, cfun->machine->frame.spare_pred_reg);
++  if (frame.spare_pred_reg != INVALID_REGNUM)
++    bitmap_clear_bit (components, frame.spare_pred_reg);
+ 
+-  unsigned reg1 = cfun->machine->frame.wb_push_candidate1;
+-  unsigned reg2 = cfun->machine->frame.wb_push_candidate2;
++  unsigned reg1 = frame.wb_push_candidate1;
++  unsigned reg2 = frame.wb_push_candidate2;
+   /* If registers have been chosen to be stored/restored with
+      writeback don't interfere with them to avoid having to output explicit
+      stack adjustment instructions.  */
+@@ -9009,6 +9072,13 @@ aarch64_get_separate_components (void)
+ 
+   bitmap_clear_bit (components, LR_REGNUM);
+   bitmap_clear_bit (components, SP_REGNUM);
++  if (flag_stack_clash_protection)
++    {
++      if (frame.sve_save_and_probe != INVALID_REGNUM)
++	bitmap_clear_bit (components, frame.sve_save_and_probe);
++      if (frame.hard_fp_save_and_probe != INVALID_REGNUM)
++	bitmap_clear_bit (components, frame.hard_fp_save_and_probe);
++    }
+ 
+   return components;
+ }
+@@ -9107,6 +9177,7 @@ aarch64_get_next_set_bit (sbitmap bmp, u
+ static void
+ aarch64_process_components (sbitmap components, bool prologue_p)
+ {
++  aarch64_frame &frame = cfun->machine->frame;
+   rtx ptr_reg = gen_rtx_REG (Pmode, frame_pointer_needed
+ 			     ? HARD_FRAME_POINTER_REGNUM
+ 			     : STACK_POINTER_REGNUM);
+@@ -9121,11 +9192,9 @@ aarch64_process_components (sbitmap comp
+       machine_mode mode = aarch64_reg_save_mode (regno);
+       
+       rtx reg = gen_rtx_REG (mode, regno);
+-      poly_int64 offset = cfun->machine->frame.reg_offset[regno];
++      poly_int64 offset = frame.reg_offset[regno];
+       if (frame_pointer_needed)
+-	offset -= cfun->machine->frame.below_hard_fp_saved_regs_size;
+-      else
+-	offset += crtl->outgoing_args_size;
++	offset -= frame.bytes_below_hard_fp;
+ 
+       rtx addr = plus_constant (Pmode, ptr_reg, offset);
+       rtx mem = gen_frame_mem (mode, addr);
+@@ -9148,14 +9217,14 @@ aarch64_process_components (sbitmap comp
+ 	  break;
+ 	}
+ 
+-      poly_int64 offset2 = cfun->machine->frame.reg_offset[regno2];
++      poly_int64 offset2 = frame.reg_offset[regno2];
+       /* The next register is not of the same class or its offset is not
+ 	 mergeable with the current one into a pair.  */
+       if (aarch64_sve_mode_p (mode)
+ 	  || !satisfies_constraint_Ump (mem)
+ 	  || GP_REGNUM_P (regno) != GP_REGNUM_P (regno2)
+ 	  || (crtl->abi->id () == ARM_PCS_SIMD && FP_REGNUM_P (regno))
+-	  || maybe_ne ((offset2 - cfun->machine->frame.reg_offset[regno]),
++	  || maybe_ne ((offset2 - frame.reg_offset[regno]),
+ 		       GET_MODE_SIZE (mode)))
+ 	{
+ 	  insn = emit_insn (set);
+@@ -9177,9 +9246,7 @@ aarch64_process_components (sbitmap comp
+       /* REGNO2 can be saved/restored in a pair with REGNO.  */
+       rtx reg2 = gen_rtx_REG (mode, regno2);
+       if (frame_pointer_needed)
+-	offset2 -= cfun->machine->frame.below_hard_fp_saved_regs_size;
+-      else
+-	offset2 += crtl->outgoing_args_size;
++	offset2 -= frame.bytes_below_hard_fp;
+       rtx addr2 = plus_constant (Pmode, ptr_reg, offset2);
+       rtx mem2 = gen_frame_mem (mode, addr2);
+       rtx set2 = prologue_p ? gen_rtx_SET (mem2, reg2)
+@@ -9253,10 +9320,10 @@ aarch64_stack_clash_protection_alloca_pr
+    registers.  If POLY_SIZE is not large enough to require a probe this function
+    will only adjust the stack.  When allocating the stack space
+    FRAME_RELATED_P is then used to indicate if the allocation is frame related.
+-   FINAL_ADJUSTMENT_P indicates whether we are allocating the outgoing
+-   arguments.  If we are then we ensure that any allocation larger than the ABI
+-   defined buffer needs a probe so that the invariant of having a 1KB buffer is
+-   maintained.
++   FINAL_ADJUSTMENT_P indicates whether we are allocating the area below
++   the saved registers.  If we are then we ensure that any allocation
++   larger than the ABI defined buffer needs a probe so that the
++   invariant of having a 1KB buffer is maintained.
+ 
+    We emit barriers after each stack adjustment to prevent optimizations from
+    breaking the invariant that we never drop the stack more than a page.  This
+@@ -9272,45 +9339,26 @@ aarch64_allocate_and_probe_stack_space (
+ 					bool frame_related_p,
+ 					bool final_adjustment_p)
+ {
++  aarch64_frame &frame = cfun->machine->frame;
+   HOST_WIDE_INT guard_size
+     = 1 << param_stack_clash_protection_guard_size;
+   HOST_WIDE_INT guard_used_by_caller = STACK_CLASH_CALLER_GUARD;
++  HOST_WIDE_INT byte_sp_alignment = STACK_BOUNDARY / BITS_PER_UNIT;
++  gcc_assert (multiple_p (poly_size, byte_sp_alignment));
+   HOST_WIDE_INT min_probe_threshold
+     = (final_adjustment_p
+-       ? guard_used_by_caller
++       ? guard_used_by_caller + byte_sp_alignment
+        : guard_size - guard_used_by_caller);
+-  /* When doing the final adjustment for the outgoing arguments, take into
+-     account any unprobed space there is above the current SP.  There are
+-     two cases:
+-
+-     - When saving SVE registers below the hard frame pointer, we force
+-       the lowest save to take place in the prologue before doing the final
+-       adjustment (i.e. we don't allow the save to be shrink-wrapped).
+-       This acts as a probe at SP, so there is no unprobed space.
+-
+-     - When there are no SVE register saves, we use the store of the link
+-       register as a probe.  We can't assume that LR was saved at position 0
+-       though, so treat any space below it as unprobed.  */
+-  if (final_adjustment_p
+-      && known_eq (cfun->machine->frame.below_hard_fp_saved_regs_size, 0))
+-    {
+-      poly_int64 lr_offset = cfun->machine->frame.reg_offset[LR_REGNUM];
+-      if (known_ge (lr_offset, 0))
+-	min_probe_threshold -= lr_offset.to_constant ();
+-      else
+-	gcc_assert (!flag_stack_clash_protection || known_eq (poly_size, 0));
+-    }
+-
+-  poly_int64 frame_size = cfun->machine->frame.frame_size;
++  poly_int64 frame_size = frame.frame_size;
+ 
+   /* We should always have a positive probe threshold.  */
+   gcc_assert (min_probe_threshold > 0);
+ 
+   if (flag_stack_clash_protection && !final_adjustment_p)
+     {
+-      poly_int64 initial_adjust = cfun->machine->frame.initial_adjust;
+-      poly_int64 sve_callee_adjust = cfun->machine->frame.sve_callee_adjust;
+-      poly_int64 final_adjust = cfun->machine->frame.final_adjust;
++      poly_int64 initial_adjust = frame.initial_adjust;
++      poly_int64 sve_callee_adjust = frame.sve_callee_adjust;
++      poly_int64 final_adjust = frame.final_adjust;
+ 
+       if (known_eq (frame_size, 0))
+ 	{
+@@ -9464,7 +9512,7 @@ aarch64_allocate_and_probe_stack_space (
+   /* Handle any residuals.  Residuals of at least MIN_PROBE_THRESHOLD have to
+      be probed.  This maintains the requirement that each page is probed at
+      least once.  For initial probing we probe only if the allocation is
+-     more than GUARD_SIZE - buffer, and for the outgoing arguments we probe
++     more than GUARD_SIZE - buffer, and below the saved registers we probe
+      if the amount is larger than buffer.  GUARD_SIZE - buffer + buffer ==
+      GUARD_SIZE.  This works that for any allocation that is large enough to
+      trigger a probe here, we'll have at least one, and if they're not large
+@@ -9474,16 +9522,12 @@ aarch64_allocate_and_probe_stack_space (
+      are still safe.  */
+   if (residual)
+     {
+-      HOST_WIDE_INT residual_probe_offset = guard_used_by_caller;
++      gcc_assert (guard_used_by_caller + byte_sp_alignment <= size);
++
+       /* If we're doing final adjustments, and we've done any full page
+ 	 allocations then any residual needs to be probed.  */
+       if (final_adjustment_p && rounded_size != 0)
+ 	min_probe_threshold = 0;
+-      /* If doing a small final adjustment, we always probe at offset 0.
+-	 This is done to avoid issues when LR is not at position 0 or when
+-	 the final adjustment is smaller than the probing offset.  */
+-      else if (final_adjustment_p && rounded_size == 0)
+-	residual_probe_offset = 0;
+ 
+       aarch64_sub_sp (temp1, temp2, residual, frame_related_p);
+       if (residual >= min_probe_threshold)
+@@ -9494,8 +9538,8 @@ aarch64_allocate_and_probe_stack_space (
+ 		     HOST_WIDE_INT_PRINT_DEC " bytes, probing will be required."
+ 		     "\n", residual);
+ 
+-	    emit_stack_probe (plus_constant (Pmode, stack_pointer_rtx,
+-					     residual_probe_offset));
++	  emit_stack_probe (plus_constant (Pmode, stack_pointer_rtx,
++					   guard_used_by_caller));
+ 	  emit_insn (gen_blockage ());
+ 	}
+     }
+@@ -9533,20 +9577,24 @@ aarch64_epilogue_uses (int regno)
+ 	|  for register varargs         |
+ 	|                               |
+ 	+-------------------------------+
+-	|  local variables              | <-- frame_pointer_rtx
++	|  local variables (1)          | <-- frame_pointer_rtx
+ 	|                               |
+ 	+-------------------------------+
+-	|  padding                      | \
+-	+-------------------------------+  |
+-	|  callee-saved registers       |  | frame.saved_regs_size
+-	+-------------------------------+  |
+-	|  LR'                          |  |
+-	+-------------------------------+  |
+-	|  FP'                          |  |
+-	+-------------------------------+  |<- hard_frame_pointer_rtx (aligned)
+-	|  SVE vector registers         |  | \
+-	+-------------------------------+  |  | below_hard_fp_saved_regs_size
+-	|  SVE predicate registers      | /  /
++	|  padding (1)                  |
++	+-------------------------------+
++	|  callee-saved registers       |
++	+-------------------------------+
++	|  LR'                          |
++	+-------------------------------+
++	|  FP'                          |
++	+-------------------------------+ <-- hard_frame_pointer_rtx (aligned)
++	|  SVE vector registers         |
++	+-------------------------------+
++	|  SVE predicate registers      |
++	+-------------------------------+
++	|  local variables (2)          |
++	+-------------------------------+
++	|  padding (2)                  |
+ 	+-------------------------------+
+ 	|  dynamic allocation           |
+ 	+-------------------------------+
+@@ -9557,6 +9605,9 @@ aarch64_epilogue_uses (int regno)
+ 	+-------------------------------+
+ 	|                               | <-- stack_pointer_rtx (aligned)
+ 
++   The regions marked (1) and (2) are mutually exclusive.  (2) is used
++   when aarch64_save_regs_above_locals_p is true.
++
+    Dynamic stack allocations via alloca() decrease stack_pointer_rtx
+    but leave frame_pointer_rtx and hard_frame_pointer_rtx
+    unchanged.
+@@ -9571,8 +9622,8 @@ aarch64_epilogue_uses (int regno)
+    When probing is needed, we emit a probe at the start of the prologue
+    and every PARAM_STACK_CLASH_PROTECTION_GUARD_SIZE bytes thereafter.
+ 
+-   We have to track how much space has been allocated and the only stores
+-   to the stack we track as implicit probes are the FP/LR stores.
++   We can also use register saves as probes.  These are stored in
++   sve_save_and_probe and hard_fp_save_and_probe.
+ 
+    For outgoing arguments we probe if the size is larger than 1KB, such that
+    the ABI specified buffer is maintained for the next callee.
+@@ -9599,17 +9650,15 @@ aarch64_epilogue_uses (int regno)
+ void
+ aarch64_expand_prologue (void)
+ {
+-  poly_int64 frame_size = cfun->machine->frame.frame_size;
+-  poly_int64 initial_adjust = cfun->machine->frame.initial_adjust;
+-  HOST_WIDE_INT callee_adjust = cfun->machine->frame.callee_adjust;
+-  poly_int64 final_adjust = cfun->machine->frame.final_adjust;
+-  poly_int64 callee_offset = cfun->machine->frame.callee_offset;
+-  poly_int64 sve_callee_adjust = cfun->machine->frame.sve_callee_adjust;
+-  poly_int64 below_hard_fp_saved_regs_size
+-    = cfun->machine->frame.below_hard_fp_saved_regs_size;
+-  unsigned reg1 = cfun->machine->frame.wb_push_candidate1;
+-  unsigned reg2 = cfun->machine->frame.wb_push_candidate2;
+-  bool emit_frame_chain = cfun->machine->frame.emit_frame_chain;
++  aarch64_frame &frame = cfun->machine->frame;
++  poly_int64 frame_size = frame.frame_size;
++  poly_int64 initial_adjust = frame.initial_adjust;
++  HOST_WIDE_INT callee_adjust = frame.callee_adjust;
++  poly_int64 final_adjust = frame.final_adjust;
++  poly_int64 sve_callee_adjust = frame.sve_callee_adjust;
++  unsigned reg1 = frame.wb_push_candidate1;
++  unsigned reg2 = frame.wb_push_candidate2;
++  bool emit_frame_chain = frame.emit_frame_chain;
+   rtx_insn *insn;
+ 
+   if (flag_stack_clash_protection && known_eq (callee_adjust, 0))
+@@ -9640,7 +9689,7 @@ aarch64_expand_prologue (void)
+     }
+ 
+   /* Push return address to shadow call stack.  */
+-  if (cfun->machine->frame.is_scs_enabled)
++  if (frame.is_scs_enabled)
+     emit_insn (gen_scs_push ());
+ 
+   if (flag_stack_usage_info)
+@@ -9677,21 +9726,21 @@ aarch64_expand_prologue (void)
+   if (callee_adjust != 0)
+     aarch64_push_regs (reg1, reg2, callee_adjust);
+ 
+-  /* The offset of the frame chain record (if any) from the current SP.  */
+-  poly_int64 chain_offset = (initial_adjust + callee_adjust
+-			     - cfun->machine->frame.hard_fp_offset);
+-  gcc_assert (known_ge (chain_offset, 0));
+-
+-  /* The offset of the bottom of the save area from the current SP.  */
+-  poly_int64 saved_regs_offset = chain_offset - below_hard_fp_saved_regs_size;
++  /* The offset of the current SP from the bottom of the static frame.  */
++  poly_int64 bytes_below_sp = frame_size - initial_adjust - callee_adjust;
+ 
+   if (emit_frame_chain)
+     {
++      /* The offset of the frame chain record (if any) from the current SP.  */
++      poly_int64 chain_offset = (initial_adjust + callee_adjust
++				 - frame.bytes_above_hard_fp);
++      gcc_assert (known_ge (chain_offset, 0));
++
+       if (callee_adjust == 0)
+ 	{
+ 	  reg1 = R29_REGNUM;
+ 	  reg2 = R30_REGNUM;
+-	  aarch64_save_callee_saves (saved_regs_offset, reg1, reg2,
++	  aarch64_save_callee_saves (bytes_below_sp, reg1, reg2,
+ 				     false, false);
+ 	}
+       else
+@@ -9716,8 +9765,7 @@ aarch64_expand_prologue (void)
+ 	     implicit.  */
+ 	  if (!find_reg_note (insn, REG_CFA_ADJUST_CFA, NULL_RTX))
+ 	    {
+-	      rtx src = plus_constant (Pmode, stack_pointer_rtx,
+-				       callee_offset);
++	      rtx src = plus_constant (Pmode, stack_pointer_rtx, chain_offset);
+ 	      add_reg_note (insn, REG_CFA_ADJUST_CFA,
+ 			    gen_rtx_SET (hard_frame_pointer_rtx, src));
+ 	    }
+@@ -9732,7 +9780,7 @@ aarch64_expand_prologue (void)
+       emit_insn (gen_stack_tie (stack_pointer_rtx, hard_frame_pointer_rtx));
+     }
+ 
+-  aarch64_save_callee_saves (saved_regs_offset, R0_REGNUM, R30_REGNUM,
++  aarch64_save_callee_saves (bytes_below_sp, R0_REGNUM, R30_REGNUM,
+ 			     callee_adjust != 0 || emit_frame_chain,
+ 			     emit_frame_chain);
+   if (maybe_ne (sve_callee_adjust, 0))
+@@ -9742,18 +9790,21 @@ aarch64_expand_prologue (void)
+       aarch64_allocate_and_probe_stack_space (tmp1_rtx, tmp0_rtx,
+ 					      sve_callee_adjust,
+ 					      !frame_pointer_needed, false);
+-      saved_regs_offset += sve_callee_adjust;
++      bytes_below_sp -= sve_callee_adjust;
+     }
+-  aarch64_save_callee_saves (saved_regs_offset, P0_REGNUM, P15_REGNUM,
++  aarch64_save_callee_saves (bytes_below_sp, P0_REGNUM, P15_REGNUM,
+ 			     false, emit_frame_chain);
+-  aarch64_save_callee_saves (saved_regs_offset, V0_REGNUM, V31_REGNUM,
++  aarch64_save_callee_saves (bytes_below_sp, V0_REGNUM, V31_REGNUM,
+ 			     callee_adjust != 0 || emit_frame_chain,
+ 			     emit_frame_chain);
+ 
+   /* We may need to probe the final adjustment if it is larger than the guard
+      that is assumed by the called.  */
++  gcc_assert (known_eq (bytes_below_sp, final_adjust));
+   aarch64_allocate_and_probe_stack_space (tmp1_rtx, tmp0_rtx, final_adjust,
+ 					  !frame_pointer_needed, true);
++  if (emit_frame_chain && maybe_ne (final_adjust, 0))
++    emit_insn (gen_stack_tie (stack_pointer_rtx, hard_frame_pointer_rtx));
+ }
+ 
+ /* Return TRUE if we can use a simple_return insn.
+@@ -9782,16 +9833,15 @@ aarch64_use_return_insn_p (void)
+ void
+ aarch64_expand_epilogue (bool for_sibcall)
+ {
+-  poly_int64 initial_adjust = cfun->machine->frame.initial_adjust;
+-  HOST_WIDE_INT callee_adjust = cfun->machine->frame.callee_adjust;
+-  poly_int64 final_adjust = cfun->machine->frame.final_adjust;
+-  poly_int64 callee_offset = cfun->machine->frame.callee_offset;
+-  poly_int64 sve_callee_adjust = cfun->machine->frame.sve_callee_adjust;
+-  poly_int64 below_hard_fp_saved_regs_size
+-    = cfun->machine->frame.below_hard_fp_saved_regs_size;
+-  unsigned reg1 = cfun->machine->frame.wb_pop_candidate1;
+-  unsigned reg2 = cfun->machine->frame.wb_pop_candidate2;
+-  unsigned int last_gpr = (cfun->machine->frame.is_scs_enabled
++  aarch64_frame &frame = cfun->machine->frame;
++  poly_int64 initial_adjust = frame.initial_adjust;
++  HOST_WIDE_INT callee_adjust = frame.callee_adjust;
++  poly_int64 final_adjust = frame.final_adjust;
++  poly_int64 sve_callee_adjust = frame.sve_callee_adjust;
++  poly_int64 bytes_below_hard_fp = frame.bytes_below_hard_fp;
++  unsigned reg1 = frame.wb_pop_candidate1;
++  unsigned reg2 = frame.wb_pop_candidate2;
++  unsigned int last_gpr = (frame.is_scs_enabled
+ 			   ? R29_REGNUM : R30_REGNUM);
+   rtx cfi_ops = NULL;
+   rtx_insn *insn;
+@@ -9825,7 +9875,7 @@ aarch64_expand_epilogue (bool for_sibcal
+   /* We need to add memory barrier to prevent read from deallocated stack.  */
+   bool need_barrier_p
+     = maybe_ne (get_frame_size ()
+-		+ cfun->machine->frame.saved_varargs_size, 0);
++		+ frame.saved_varargs_size, 0);
+ 
+   /* Emit a barrier to prevent loads from a deallocated stack.  */
+   if (maybe_gt (final_adjust, crtl->outgoing_args_size)
+@@ -9846,7 +9896,7 @@ aarch64_expand_epilogue (bool for_sibcal
+        is restored on the instruction doing the writeback.  */
+     aarch64_add_offset (Pmode, stack_pointer_rtx,
+ 			hard_frame_pointer_rtx,
+-			-callee_offset - below_hard_fp_saved_regs_size,
++			-bytes_below_hard_fp + final_adjust,
+ 			tmp1_rtx, tmp0_rtx, callee_adjust == 0);
+   else
+      /* The case where we need to re-use the register here is very rare, so
+@@ -9856,9 +9906,9 @@ aarch64_expand_epilogue (bool for_sibcal
+ 
+   /* Restore the vector registers before the predicate registers,
+      so that we can use P4 as a temporary for big-endian SVE frames.  */
+-  aarch64_restore_callee_saves (callee_offset, V0_REGNUM, V31_REGNUM,
++  aarch64_restore_callee_saves (final_adjust, V0_REGNUM, V31_REGNUM,
+ 				callee_adjust != 0, &cfi_ops);
+-  aarch64_restore_callee_saves (callee_offset, P0_REGNUM, P15_REGNUM,
++  aarch64_restore_callee_saves (final_adjust, P0_REGNUM, P15_REGNUM,
+ 				false, &cfi_ops);
+   if (maybe_ne (sve_callee_adjust, 0))
+     aarch64_add_sp (NULL_RTX, NULL_RTX, sve_callee_adjust, true);
+@@ -9866,7 +9916,7 @@ aarch64_expand_epilogue (bool for_sibcal
+   /* When shadow call stack is enabled, the scs_pop in the epilogue will
+      restore x30, we don't need to restore x30 again in the traditional
+      way.  */
+-  aarch64_restore_callee_saves (callee_offset - sve_callee_adjust,
++  aarch64_restore_callee_saves (final_adjust + sve_callee_adjust,
+ 				R0_REGNUM, last_gpr,
+ 				callee_adjust != 0, &cfi_ops);
+ 
+@@ -9906,7 +9956,7 @@ aarch64_expand_epilogue (bool for_sibcal
+     }
+ 
+   /* Pop return address from shadow call stack.  */
+-  if (cfun->machine->frame.is_scs_enabled)
++  if (frame.is_scs_enabled)
+     {
+       machine_mode mode = aarch64_reg_save_mode (R30_REGNUM);
+       rtx reg = gen_rtx_REG (mode, R30_REGNUM);
+@@ -12501,24 +12551,24 @@ aarch64_can_eliminate (const int from AT
+ poly_int64
+ aarch64_initial_elimination_offset (unsigned from, unsigned to)
+ {
++  aarch64_frame &frame = cfun->machine->frame;
++
+   if (to == HARD_FRAME_POINTER_REGNUM)
+     {
+       if (from == ARG_POINTER_REGNUM)
+-	return cfun->machine->frame.hard_fp_offset;
++	return frame.bytes_above_hard_fp;
+ 
+       if (from == FRAME_POINTER_REGNUM)
+-	return cfun->machine->frame.hard_fp_offset
+-	       - cfun->machine->frame.locals_offset;
++	return frame.bytes_above_hard_fp - frame.bytes_above_locals;
+     }
+ 
+   if (to == STACK_POINTER_REGNUM)
+     {
+       if (from == FRAME_POINTER_REGNUM)
+-	  return cfun->machine->frame.frame_size
+-		 - cfun->machine->frame.locals_offset;
++	return frame.frame_size - frame.bytes_above_locals;
+     }
+ 
+-  return cfun->machine->frame.frame_size;
++  return frame.frame_size;
+ }
+ 
+ 
+@@ -25795,11 +25845,9 @@ aarch64_operands_ok_for_ldpstp (rtx *ope
+   gcc_assert (known_eq (GET_MODE_SIZE (GET_MODE (mem_1)),
+ 			GET_MODE_SIZE (GET_MODE (mem_2))));
+ 
+-  /* One of the memory accesses must be a mempair operand.
+-     If it is not the first one, they need to be swapped by the
+-     peephole.  */
+-  if (!aarch64_mem_pair_operand (mem_1, GET_MODE (mem_1))
+-       && !aarch64_mem_pair_operand (mem_2, GET_MODE (mem_2)))
++  /* The lower memory access must be a mem-pair operand.  */
++  rtx lower_mem = reversed ? mem_2 : mem_1;
++  if (!aarch64_mem_pair_operand (lower_mem, GET_MODE (lower_mem)))
+     return false;
+ 
+   if (REG_P (reg_1) && FP_REGNUM_P (REGNO (reg_1)))
+Index: gcc-12-12.2.0/src/gcc/config/aarch64/aarch64.h
+===================================================================
+--- gcc-12-12.2.0.orig/src/gcc/config/aarch64/aarch64.h
++++ gcc-12-12.2.0/src/gcc/config/aarch64/aarch64.h
+@@ -862,6 +862,9 @@ extern enum aarch64_processor aarch64_tu
+ #ifdef HAVE_POLY_INT_H
+ struct GTY (()) aarch64_frame
+ {
++  /* The offset from the bottom of the static frame (the bottom of the
++     outgoing arguments) of each register save slot, or -2 if no save is
++     needed.  */
+   poly_int64 reg_offset[LAST_SAVED_REGNUM + 1];
+ 
+   /* The number of extra stack bytes taken up by register varargs.
+@@ -870,25 +873,28 @@ struct GTY (()) aarch64_frame
+      STACK_BOUNDARY.  */
+   HOST_WIDE_INT saved_varargs_size;
+ 
+-  /* The size of the callee-save registers with a slot in REG_OFFSET.  */
+-  poly_int64 saved_regs_size;
++  /* The number of bytes between the bottom of the static frame (the bottom
++     of the outgoing arguments) and the bottom of the register save area.
++     This value is always a multiple of STACK_BOUNDARY.  */
++  poly_int64 bytes_below_saved_regs;
++
++  /* The number of bytes between the bottom of the static frame (the bottom
++     of the outgoing arguments) and the hard frame pointer.  This value is
++     always a multiple of STACK_BOUNDARY.  */
++  poly_int64 bytes_below_hard_fp;
+ 
+-  /* The size of the callee-save registers with a slot in REG_OFFSET that
+-     are saved below the hard frame pointer.  */
+-  poly_int64 below_hard_fp_saved_regs_size;
+-
+-  /* Offset from the base of the frame (incomming SP) to the
+-     top of the locals area.  This value is always a multiple of
++  /* The number of bytes between the top of the locals area and the top
++     of the frame (the incomming SP).  This value is always a multiple of
+      STACK_BOUNDARY.  */
+-  poly_int64 locals_offset;
++  poly_int64 bytes_above_locals;
+ 
+-  /* Offset from the base of the frame (incomming SP) to the
+-     hard_frame_pointer.  This value is always a multiple of
++  /* The number of bytes between the hard_frame_pointer and the top of
++     the frame (the incomming SP).  This value is always a multiple of
+      STACK_BOUNDARY.  */
+-  poly_int64 hard_fp_offset;
++  poly_int64 bytes_above_hard_fp;
+ 
+-  /* The size of the frame.  This value is the offset from base of the
+-     frame (incomming SP) to the stack_pointer.  This value is always
++  /* The size of the frame, i.e. the number of bytes between the bottom
++     of the outgoing arguments and the incoming SP.  This value is always
+      a multiple of STACK_BOUNDARY.  */
+   poly_int64 frame_size;
+ 
+@@ -899,10 +905,6 @@ struct GTY (()) aarch64_frame
+      It is zero when no push is used.  */
+   HOST_WIDE_INT callee_adjust;
+ 
+-  /* The offset from SP to the callee-save registers after initial_adjust.
+-     It may be non-zero if no push is used (ie. callee_adjust == 0).  */
+-  poly_int64 callee_offset;
+-
+   /* The size of the stack adjustment before saving or after restoring
+      SVE registers.  */
+   poly_int64 sve_callee_adjust;
+@@ -950,6 +952,14 @@ struct GTY (()) aarch64_frame
+      This is the register they should use.  */
+   unsigned spare_pred_reg;
+ 
++  /* An SVE register that is saved below the hard frame pointer and that acts
++     as a probe for later allocations, or INVALID_REGNUM if none.  */
++  unsigned sve_save_and_probe;
++
++  /* A register that is saved at the hard frame pointer and that acts
++     as a probe for later allocations, or INVALID_REGNUM if none.  */
++  unsigned hard_fp_save_and_probe;
++
+   bool laid_out;
+ 
+   /* True if shadow call stack should be enabled for the current function.  */
+Index: gcc-12-12.2.0/src/gcc/testsuite/gcc.target/aarch64/stack-check-prologue-17.c
+===================================================================
+--- /dev/null
++++ gcc-12-12.2.0/src/gcc/testsuite/gcc.target/aarch64/stack-check-prologue-17.c
+@@ -0,0 +1,55 @@
++/* { dg-options "-O2 -fstack-clash-protection -fomit-frame-pointer --param stack-clash-protection-guard-size=12" } */
++/* { dg-final { check-function-bodies "**" "" } } */
++
++void f(int, ...);
++void g();
++
++/*
++** test1:
++**	...
++**	str	x30, \[sp\]
++**	sub	sp, sp, #1024
++**	cbnz	w0, .*
++**	bl	g
++**	...
++*/
++int test1(int z) {
++  __uint128_t x = 0;
++  int y[0x400];
++  if (z)
++    {
++      f(0, 0, 0, 0, 0, 0, 0, &y,
++	x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x,
++	x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x,
++	x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x,
++	x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x);
++    }
++  g();
++  return 1;
++}
++
++/*
++** test2:
++**	...
++**	str	x30, \[sp\]
++**	sub	sp, sp, #1040
++**	str	xzr, \[sp, #?1024\]
++**	cbnz	w0, .*
++**	bl	g
++**	...
++*/
++int test2(int z) {
++  __uint128_t x = 0;
++  int y[0x400];
++  if (z)
++    {
++      f(0, 0, 0, 0, 0, 0, 0, &y,
++	x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x,
++	x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x,
++	x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x,
++	x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x,
++	x);
++    }
++  g();
++  return 1;
++}
+Index: gcc-12-12.2.0/src/gcc/testsuite/gcc.target/aarch64/stack-check-prologue-18.c
+===================================================================
+--- /dev/null
++++ gcc-12-12.2.0/src/gcc/testsuite/gcc.target/aarch64/stack-check-prologue-18.c
+@@ -0,0 +1,100 @@
++/* { dg-options "-O2 -fstack-clash-protection -fomit-frame-pointer --param stack-clash-protection-guard-size=12" } */
++/* { dg-final { check-function-bodies "**" "" } } */
++
++void f(int, ...);
++void g();
++
++/*
++** test1:
++**	...
++**	str	x30, \[sp\]
++**	sub	sp, sp, #4064
++**	str	xzr, \[sp, #?1024\]
++**	cbnz	w0, .*
++**	bl	g
++**	...
++**	str	x26, \[sp, #?4128\]
++**	...
++*/
++int test1(int z) {
++  __uint128_t x = 0;
++  int y[0x400];
++  if (z)
++    {
++      asm volatile ("" :::
++		    "x19", "x20", "x21", "x22", "x23", "x24", "x25", "x26");
++      f(0, 0, 0, 0, 0, 0, 0, &y,
++	x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x,
++	x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x,
++	x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x,
++	x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x,
++	x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x,
++	x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x,
++	x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x,
++	x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x,
++	x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x,
++	x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x,
++	x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x,
++	x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x,
++	x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x,
++	x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x,
++	x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x,
++	x, x, x, x, x, x, x, x, x, x, x, x, x, x);
++    }
++  g();
++  return 1;
++}
++
++/*
++** test2:
++**	...
++**	str	x30, \[sp\]
++**	sub	sp, sp, #1040
++**	str	xzr, \[sp, #?1024\]
++**	cbnz	w0, .*
++**	bl	g
++**	...
++*/
++int test2(int z) {
++  __uint128_t x = 0;
++  int y[0x400];
++  if (z)
++    {
++      asm volatile ("" :::
++		    "x19", "x20", "x21", "x22", "x23", "x24", "x25", "x26");
++      f(0, 0, 0, 0, 0, 0, 0, &y,
++	x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x,
++	x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x,
++	x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x,
++	x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x,
++	x);
++    }
++  g();
++  return 1;
++}
++
++/*
++** test3:
++**	...
++**	str	x30, \[sp\]
++**	sub	sp, sp, #1024
++**	cbnz	w0, .*
++**	bl	g
++**	...
++*/
++int test3(int z) {
++  __uint128_t x = 0;
++  int y[0x400];
++  if (z)
++    {
++      asm volatile ("" :::
++		    "x19", "x20", "x21", "x22", "x23", "x24", "x25", "x26");
++      f(0, 0, 0, 0, 0, 0, 0, &y,
++	x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x,
++	x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x,
++	x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x,
++	x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x);
++    }
++  g();
++  return 1;
++}
+Index: gcc-12-12.2.0/src/gcc/testsuite/gcc.target/aarch64/stack-check-prologue-19.c
+===================================================================
+--- /dev/null
++++ gcc-12-12.2.0/src/gcc/testsuite/gcc.target/aarch64/stack-check-prologue-19.c
+@@ -0,0 +1,100 @@
++/* { dg-options "-O2 -fstack-clash-protection -fomit-frame-pointer --param stack-clash-protection-guard-size=12 -fsanitize=shadow-call-stack -ffixed-x18" } */
++/* { dg-final { check-function-bodies "**" "" } } */
++
++void f(int, ...);
++void g();
++
++/*
++** test1:
++**	...
++**	str	x30, \[sp\]
++**	sub	sp, sp, #4064
++**	str	xzr, \[sp, #?1024\]
++**	cbnz	w0, .*
++**	bl	g
++**	...
++**	str	x26, \[sp, #?4128\]
++**	...
++*/
++int test1(int z) {
++  __uint128_t x = 0;
++  int y[0x400];
++  if (z)
++    {
++      asm volatile ("" :::
++		    "x19", "x20", "x21", "x22", "x23", "x24", "x25", "x26");
++      f(0, 0, 0, 0, 0, 0, 0, &y,
++	x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x,
++	x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x,
++	x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x,
++	x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x,
++	x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x,
++	x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x,
++	x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x,
++	x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x,
++	x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x,
++	x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x,
++	x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x,
++	x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x,
++	x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x,
++	x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x,
++	x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x,
++	x, x, x, x, x, x, x, x, x, x, x, x, x, x);
++    }
++  g();
++  return 1;
++}
++
++/*
++** test2:
++**	...
++**	str	x30, \[sp\]
++**	sub	sp, sp, #1040
++**	str	xzr, \[sp, #?1024\]
++**	cbnz	w0, .*
++**	bl	g
++**	...
++*/
++int test2(int z) {
++  __uint128_t x = 0;
++  int y[0x400];
++  if (z)
++    {
++      asm volatile ("" :::
++		    "x19", "x20", "x21", "x22", "x23", "x24", "x25", "x26");
++      f(0, 0, 0, 0, 0, 0, 0, &y,
++	x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x,
++	x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x,
++	x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x,
++	x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x,
++	x);
++    }
++  g();
++  return 1;
++}
++
++/*
++** test3:
++**	...
++**	str	x30, \[sp\]
++**	sub	sp, sp, #1024
++**	cbnz	w0, .*
++**	bl	g
++**	...
++*/
++int test3(int z) {
++  __uint128_t x = 0;
++  int y[0x400];
++  if (z)
++    {
++      asm volatile ("" :::
++		    "x19", "x20", "x21", "x22", "x23", "x24", "x25", "x26");
++      f(0, 0, 0, 0, 0, 0, 0, &y,
++	x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x,
++	x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x,
++	x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x,
++	x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x);
++    }
++  g();
++  return 1;
++}
+Index: gcc-12-12.2.0/src/gcc/testsuite/gcc.target/aarch64/stack-check-prologue-20.c
+===================================================================
+--- /dev/null
++++ gcc-12-12.2.0/src/gcc/testsuite/gcc.target/aarch64/stack-check-prologue-20.c
+@@ -0,0 +1,3 @@
++/* { dg-options "-O2 -fstack-protector-all -fstack-clash-protection -fomit-frame-pointer --param stack-clash-protection-guard-size=12 -fsanitize=shadow-call-stack -ffixed-x18" } */
++
++#include "stack-check-prologue-19.c"
+Index: gcc-12-12.2.0/src/gcc/testsuite/gcc.target/aarch64/stack-protector-8.c
+===================================================================
+--- /dev/null
++++ gcc-12-12.2.0/src/gcc/testsuite/gcc.target/aarch64/stack-protector-8.c
+@@ -0,0 +1,95 @@
++/* { dg-options " -O -fstack-protector-strong -mstack-protector-guard=sysreg -mstack-protector-guard-reg=tpidr2_el0 -mstack-protector-guard-offset=16" } */
++/* { dg-final { check-function-bodies "**" "" } } */
++
++void g(void *);
++__SVBool_t *h(void *);
++
++/*
++** test1:
++**	sub	sp, sp, #288
++**	stp	x29, x30, \[sp, #?272\]
++**	add	x29, sp, #?272
++**	mrs	(x[0-9]+), tpidr2_el0
++**	ldr	(x[0-9]+), \[\1, #?16\]
++**	str	\2, \[sp, #?264\]
++**	mov	\2, #?0
++**	add	x0, sp, #?8
++**	bl	g
++**	...
++**	mrs	.*
++**	...
++**	bne	.*
++**	...
++**	ldp	x29, x30, \[sp, #?272\]
++**	add	sp, sp, #?288
++**	ret
++**	bl	__stack_chk_fail
++*/
++int test1() {
++  int y[0x40];
++  g(y);
++  return 1;
++}
++
++/*
++** test2:
++**	stp	x29, x30, \[sp, #?-16\]!
++**	mov	x29, sp
++**	sub	sp, sp, #1040
++**	mrs	(x[0-9]+), tpidr2_el0
++**	ldr	(x[0-9]+), \[\1, #?16\]
++**	str	\2, \[sp, #?1032\]
++**	mov	\2, #?0
++**	add	x0, sp, #?8
++**	bl	g
++**	...
++**	mrs	.*
++**	...
++**	bne	.*
++**	...
++**	add	sp, sp, #?1040
++**	ldp	x29, x30, \[sp\], #?16
++**	ret
++**	bl	__stack_chk_fail
++*/
++int test2() {
++  int y[0x100];
++  g(y);
++  return 1;
++}
++
++#pragma GCC target "+sve"
++
++/*
++** test3:
++**	stp	x29, x30, \[sp, #?-16\]!
++**	mov	x29, sp
++**	addvl	sp, sp, #-18
++**	...
++**	str	p4, \[sp\]
++**	...
++**	sub	sp, sp, #272
++**	mrs	(x[0-9]+), tpidr2_el0
++**	ldr	(x[0-9]+), \[\1, #?16\]
++**	str	\2, \[sp, #?264\]
++**	mov	\2, #?0
++**	add	x0, sp, #?8
++**	bl	h
++**	...
++**	mrs	.*
++**	...
++**	bne	.*
++**	...
++**	add	sp, sp, #?272
++**	...
++**	ldr	p4, \[sp\]
++**	...
++**	addvl	sp, sp, #18
++**	ldp	x29, x30, \[sp\], #?16
++**	ret
++**	bl	__stack_chk_fail
++*/
++__SVBool_t test3() {
++  int y[0x40];
++  return *h(y);
++}
+Index: gcc-12-12.2.0/src/gcc/testsuite/gcc.target/aarch64/stack-protector-9.c
+===================================================================
+--- /dev/null
++++ gcc-12-12.2.0/src/gcc/testsuite/gcc.target/aarch64/stack-protector-9.c
+@@ -0,0 +1,33 @@
++/* { dg-options "-O2 -mcpu=neoverse-v1 -fstack-protector-all" } */
++/* { dg-final { check-function-bodies "**" "" } } */
++
++/*
++** main:
++**	...
++**	stp	x29, x30, \[sp, #?-[0-9]+\]!
++**	...
++**	sub	sp, sp, #[0-9]+
++**	...
++**	str	x[0-9]+, \[x29, #?-8\]
++**	...
++*/
++int f(const char *);
++void g(void *);
++int main(int argc, char* argv[])
++{
++  int a;
++  int b;
++  char c[2+f(argv[1])];
++  int d[0x100];
++  char y;
++
++  y=42; a=4; b=10;
++  c[0] = 'h'; c[1] = '\0';
++
++  c[f(argv[2])] = '\0';
++
++  __builtin_printf("%d %d\n%s\n", a, b, c);
++  g(d);
++
++  return 0;
++}
+Index: gcc-12-12.2.0/src/gcc/testsuite/gcc.target/aarch64/sve/pcs/stack_clash_3.c
+===================================================================
+--- gcc-12-12.2.0.orig/src/gcc/testsuite/gcc.target/aarch64/sve/pcs/stack_clash_3.c
++++ gcc-12-12.2.0/src/gcc/testsuite/gcc.target/aarch64/sve/pcs/stack_clash_3.c
+@@ -11,11 +11,10 @@
+ **	mov	x11, sp
+ **	...
+ **	sub	sp, sp, x13
+-**	str	p4, \[sp\]
+ **	cbz	w0, [^\n]*
++**	str	p4, \[sp\]
+ **	...
+ **	ptrue	p0\.b, all
+-**	ldr	p4, \[sp\]
+ **	addvl	sp, sp, #1
+ **	ldr	x24, \[sp\], 32
+ **	ret
+@@ -39,13 +38,12 @@ test_1 (int n)
+ **	mov	x11, sp
+ **	...
+ **	sub	sp, sp, x13
+-**	str	p4, \[sp\]
+ **	cbz	w0, [^\n]*
++**	str	p4, \[sp\]
+ **	str	p5, \[sp, #1, mul vl\]
+ **	str	p6, \[sp, #2, mul vl\]
+ **	...
+ **	ptrue	p0\.b, all
+-**	ldr	p4, \[sp\]
+ **	addvl	sp, sp, #1
+ **	ldr	x24, \[sp\], 32
+ **	ret
+Index: gcc-12-12.2.0/src/gcc/testsuite/gcc.dg/rtl/aarch64/pr111411.c
+===================================================================
+--- /dev/null
++++ gcc-12-12.2.0/src/gcc/testsuite/gcc.dg/rtl/aarch64/pr111411.c
+@@ -0,0 +1,57 @@
++/* { dg-do compile { target aarch64*-*-* } } */
++/* { dg-require-effective-target lp64 } */
++/* { dg-options "-O -fdisable-rtl-postreload -fpeephole2 -fno-schedule-fusion" } */
++
++extern int data[];
++
++void __RTL (startwith ("ira")) foo (void *ptr)
++{
++  (function "foo"
++    (param "ptr"
++      (DECL_RTL (reg/v:DI <0> [ ptr ]))
++      (DECL_RTL_INCOMING (reg/v:DI x0 [ ptr ]))
++    ) ;; param "ptr"
++    (insn-chain
++      (block 2
++	(edge-from entry (flags "FALLTHRU"))
++	(cnote 3 [bb 2] NOTE_INSN_BASIC_BLOCK)
++	(insn 4 (set (reg:DI <0>) (reg:DI x0)))
++	(insn 5 (set (reg:DI <1>)
++		     (plus:DI (reg:DI <0>) (const_int 768))))
++	(insn 6 (set (mem:SI (plus:DI (reg:DI <0>)
++				      (const_int 508)) [1 &data+508 S4 A4])
++		     (const_int 0)))
++	(insn 7 (set (mem:SI (plus:DI (reg:DI <1>)
++				      (const_int -256)) [1 &data+512 S4 A4])
++		     (const_int 0)))
++	(edge-to exit (flags "FALLTHRU"))
++      ) ;; block 2
++    ) ;; insn-chain
++  ) ;; function
++}
++
++void __RTL (startwith ("ira")) bar (void *ptr)
++{
++  (function "bar"
++    (param "ptr"
++      (DECL_RTL (reg/v:DI <0> [ ptr ]))
++      (DECL_RTL_INCOMING (reg/v:DI x0 [ ptr ]))
++    ) ;; param "ptr"
++    (insn-chain
++      (block 2
++	(edge-from entry (flags "FALLTHRU"))
++	(cnote 3 [bb 2] NOTE_INSN_BASIC_BLOCK)
++	(insn 4 (set (reg:DI <0>) (reg:DI x0)))
++	(insn 5 (set (reg:DI <1>)
++		     (plus:DI (reg:DI <0>) (const_int 768))))
++	(insn 6 (set (mem:SI (plus:DI (reg:DI <1>)
++				      (const_int -256)) [1 &data+512 S4 A4])
++		     (const_int 0)))
++	(insn 7 (set (mem:SI (plus:DI (reg:DI <0>)
++				      (const_int 508)) [1 &data+508 S4 A4])
++		     (const_int 0)))
++	(edge-to exit (flags "FALLTHRU"))
++      ) ;; block 2
++    ) ;; insn-chain
++  ) ;; function
++}
diff -Nru gcc-12-12.2.0/debian/rules.patch gcc-12-12.2.0/debian/rules.patch
--- gcc-12-12.2.0/debian/rules.patch	2022-12-26 16:53:20.000000000 +0100
+++ gcc-12-12.2.0/debian/rules.patch	2023-09-19 21:36:59.000000000 +0200
@@ -245,6 +245,8 @@
 debian_patches += libgomp-kfreebsd-testsuite
 debian_patches += go-testsuite
 
+debian_patches += CVE-2023-4039
+
 # don't remove, this is regularly overwritten, see PR sanitizer/63958.
 #debian_patches += libasan-sparc
 

Reply to: