[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#1055211: marked as done (bookworm-pu: package gcc-12/12.2.0-14+deb12u1 (CVE-2023-4039))



Your message dated Sat, 17 May 2025 09:37:57 +0000
with message-id <E1uGDzR-005KGo-Hj@coccia.debian.org>
and subject line Close 1055211
has caused the Debian Bug report #1055211,
regarding bookworm-pu: package gcc-12/12.2.0-14+deb12u1 (CVE-2023-4039)
to be marked as done.

This means that you claim that the problem has been dealt with.
If this is not the case it is now your responsibility to reopen the
Bug report if necessary, and/or fix the problem forthwith.

(NB: If you are a system administrator and have no idea what this
message is talking about, this may indicate a serious mail system
misconfiguration somewhere. Please contact owner@bugs.debian.org
immediately.)


-- 
1055211: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1055211
Debian Bug Tracking System
Contact owner@bugs.debian.org with problems
--- Begin Message ---
Package: release.debian.org
Severity: normal
Tags: bookworm
User: release.debian.org@packages.debian.org
Usertags: pu
X-Debbugs-Cc: debian-gcc@lists.debian.org, jmm@debian.org, doko@debian.org
Control: affects -1 + src:gcc-12

[ Reason ]
A failure in the -fstack-protector feature in GCC-based toolchains that target
AArch64 allows an attacker to exploit an existing buffer overflow in
dynamically-sized local variables without this being detected.

The Security Team explicitly stated that they will not release a DSA for the
bug, see https://security-tracker.debian.org/tracker/CVE-2023-4039 

This issue has been fixed in version 12.3.0-9 for sid and trixie. It would now
be good to address the problem in bookworm too.

[ Impact ]
Without this change, arm64 users of gcc-12 in bookworm are not fully protected
by -fstack-protector as described in CVE-2023-4039.

[ Tests ]
In terms of manual testing, I have verified that the stack smashing attack
published at
https://github.com/metaredteam/external-disclosures/security/advisories/GHSA-x7ch-h5rf-w2mf
is detected and stopped when the PoC is compiled with gcc 12.2.0-14+deb12u1,
while it results in a Bus error with 12.2.0-14:

  $ gcc-12 --version | head -1
  gcc-12 (Debian 12.2.0-14+deb12u1) 12.2.0
  $ gcc-12 -fstack-protector-all -O3 -static -Wall -Wextra -pedantic -o poc poc.c
  $ echo -n 'DDDDDDDDPPPPPPPPFFFFFFFFAAAAAAAA' | ./poc 8
  *** stack smashing detected ***: terminated
  Aborted

  $ gcc-12 --version | head -1
  gcc-12 (Debian 12.2.0-14) 12.2.0
  $ gcc-12 -fstack-protector-all -O3 -static -Wall -Wextra -pedantic -o poc poc.c
  $ echo -n 'DDDDDDDDPPPPPPPPFFFFFFFFAAAAAAAA' | ./poc 8
  Bus error

As an additional smoke test I have also successfully built a few packages, as
well as a kernel.

In terms of automated testing, the upstream test suite is extensive.
Additionally, upstream added the following tests specifically for
CVE-2023-4039:

 testsuite/gcc.target/aarch64/stack-check-prologue-17.c
 testsuite/gcc.target/aarch64/stack-check-prologue-18.c
 testsuite/gcc.target/aarch64/stack-check-prologue-19.c
 testsuite/gcc.target/aarch64/stack-check-prologue-20.c
 testsuite/gcc.target/aarch64/stack-protector-8.c
 testsuite/gcc.target/aarch64/stack-protector-9.c

[ Risks ]
There are obviously potential risks of regressions associated with compiler
changes. However the specific changes proposed here have been part of gcc-12
upstream since September, and by now have been tested quite a bit. One
regression has been found early on and fixed:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111411

All changes are arm64-specific and don't affect any other architecture.

[ Checklist ]
  [x] *all* changes are documented in the d/changelog
  [x] I reviewed all changes and I approve them
  [x] attach debdiff against the package in (old)stable
  [x] the issue is verified as fixed in unstable

[ Changes ]
All changes merged upstream back in September 2023, see
https://gcc.gnu.org/git/?p=gcc.git;a=shortlog;h=refs/heads/releases/gcc-12

- aarch64: Avoid a use of callee_offset
  12a8889de169f892d2e927584c00d20b8b7e456f

- aarch64: Explicitly handle frames with no saved registers
  03d5e89e7f3be53fd7142556e8e0a2774c653dca

- aarch64: Add bytes_below_saved_regs to frame info
  49c2eb7616756c323b7f6b18d8616ec945eb1263

- aarch64: Add bytes_below_hard_fp to frame info
  34081079ea4de0c98331843f574b5f6f94d7b234

- aarch64: Tweak aarch64_save/restore_callee_saves
  187861af7c51db9eddc6f954b589c121b210fc74

- aarch64: Only calculate chain_offset if there is a chain
  2b983f9064d808daf909bde1d4a13980934a7e6e

- aarch64: Rename locals_offset to bytes_above_locals
  0a0a824808d1dec51004fb5805c1a0ae2a35433f

- aarch64: Rename hard_fp_offset to bytes_above_hard_fp
  3fbf0789202b30a67b12e1fb785c7130f098d665

- aarch64: Tweak frame_size comment
  aac8b31379ac3bbd14fc6427dce23f56e54e8485

- aarch64: Measure reg_offset from the bottom of the frame
  8d5506a8aeb8dd7e8b209a3663b07688478f76b9

- aarch64: Simplify top of frame allocation
  b47766614df3b9df878262efb2ad73aaac108363

- aarch64: Minor initial adjustment tweak
  08f71b4bb28fb74d20e8d2927a557e8119ce9f4d

- aarch64: Tweak stack clash boundary condition
  f22315d5c19e8310e4dc880fd509678fd291fca8

- aarch64: Put LR save probe in first 16 bytes
  15e18831bf98fd25af098b970ebf0c9a6200a34b

- aarch64: Simplify probe of final frame allocation
  c4f0e121faa36342f1d21919e54a05ad841c4f86

- aarch64: Explicitly record probe registers in frame info
  6f0ab0a9f46a17b68349ff6035aa776bf65f0575

- aarch64: Remove below_hard_fp_saved_regs_size
  8254e1b9cd500e0c278465a3657543477e9d1250

- aarch64: Make stack smash canary protect saved registers
  75c37e031408262263442f5b4cdb83d3777b6422

- aarch64: Fix return register handling in untyped_call
  38d0605ac8bc90324170041676fc05e7e595769e

- aarch64: Fix loose ldpstp check [PR111411]
  74f99f1adc696f446115f36974a3f94f66294a53
diff -Nru gcc-12-12.2.0/debian/changelog gcc-12-12.2.0/debian/changelog
--- gcc-12-12.2.0/debian/changelog	2023-01-08 10:12:42.000000000 +0100
+++ gcc-12-12.2.0/debian/changelog	2023-10-09 11:45:48.000000000 +0200
@@ -1,3 +1,9 @@
+gcc-12 (12.2.0-14+deb12u1) bookworm; urgency=medium
+
+  * Fix -fstack-protector handling of overflows on AArch64 (CVE-2023-4039).
+
+ -- Emanuele Rocca <ema@debian.org>  Mon, 09 Oct 2023 11:45:48 +0200
+
 gcc-12 (12.2.0-14) unstable; urgency=medium
 
   * Update to git 20230108 from the gcc-12 branch.
diff -Nru gcc-12-12.2.0/debian/patches/CVE-2023-4039.diff gcc-12-12.2.0/debian/patches/CVE-2023-4039.diff
--- gcc-12-12.2.0/debian/patches/CVE-2023-4039.diff	1970-01-01 01:00:00.000000000 +0100
+++ gcc-12-12.2.0/debian/patches/CVE-2023-4039.diff	2023-09-21 10:08:42.000000000 +0200
@@ -0,0 +1,1665 @@
+DP: Fix CVE-2023-4039.
+DP: 
+DP: The patch includes the following upstream commits:
+DP: 
+DP: aarch64: Use local frame vars in shrink-wrapping code
+DP: 62fbb215cc817e9f2c1ca80282a64f4ee30806bc
+DP: 
+DP: aarch64: Avoid a use of callee_offset
+DP: 12a8889de169f892d2e927584c00d20b8b7e456f
+DP: 
+DP: aarch64: Explicitly handle frames with no saved registers
+DP: 03d5e89e7f3be53fd7142556e8e0a2774c653dca
+DP: 
+DP: aarch64: Add bytes_below_saved_regs to frame info
+DP: 49c2eb7616756c323b7f6b18d8616ec945eb1263
+DP: 
+DP: aarch64: Add bytes_below_hard_fp to frame info
+DP: 34081079ea4de0c98331843f574b5f6f94d7b234
+DP: 
+DP: aarch64: Tweak aarch64_save/restore_callee_saves
+DP: 187861af7c51db9eddc6f954b589c121b210fc74
+DP: 
+DP: aarch64: Only calculate chain_offset if there is a chain
+DP: 2b983f9064d808daf909bde1d4a13980934a7e6e
+DP: 
+DP: aarch64: Rename locals_offset to bytes_above_locals
+DP: 0a0a824808d1dec51004fb5805c1a0ae2a35433f
+DP: 
+DP: aarch64: Rename hard_fp_offset to bytes_above_hard_fp
+DP: 3fbf0789202b30a67b12e1fb785c7130f098d665
+DP: 
+DP: aarch64: Tweak frame_size comment
+DP: aac8b31379ac3bbd14fc6427dce23f56e54e8485
+DP: 
+DP: aarch64: Measure reg_offset from the bottom of the frame
+DP: 8d5506a8aeb8dd7e8b209a3663b07688478f76b9
+DP: 
+DP: aarch64: Simplify top of frame allocation
+DP: b47766614df3b9df878262efb2ad73aaac108363
+DP: 
+DP: aarch64: Minor initial adjustment tweak
+DP: 08f71b4bb28fb74d20e8d2927a557e8119ce9f4d
+DP: 
+DP: aarch64: Tweak stack clash boundary condition
+DP: f22315d5c19e8310e4dc880fd509678fd291fca8
+DP: 
+DP: aarch64: Put LR save probe in first 16 bytes
+DP: 15e18831bf98fd25af098b970ebf0c9a6200a34b
+DP: 
+DP: aarch64: Simplify probe of final frame allocation
+DP: c4f0e121faa36342f1d21919e54a05ad841c4f86
+DP: 
+DP: aarch64: Explicitly record probe registers in frame info
+DP: 6f0ab0a9f46a17b68349ff6035aa776bf65f0575
+DP: 
+DP: aarch64: Remove below_hard_fp_saved_regs_size
+DP: 8254e1b9cd500e0c278465a3657543477e9d1250
+DP: 
+DP: aarch64: Make stack smash canary protect saved registers
+DP: 75c37e031408262263442f5b4cdb83d3777b6422
+DP: 
+DP: aarch64: Fix return register handling in untyped_call
+DP: 38d0605ac8bc90324170041676fc05e7e595769e
+DP: 
+DP: aarch64: Fix loose ldpstp check [PR111411]
+DP: 74f99f1adc696f446115f36974a3f94f66294a53
+
+Index: gcc-12-12.2.0/src/gcc/config/aarch64/aarch64.cc
+===================================================================
+--- gcc-12-12.2.0.orig/src/gcc/config/aarch64/aarch64.cc
++++ gcc-12-12.2.0/src/gcc/config/aarch64/aarch64.cc
+@@ -8070,18 +8070,32 @@ aarch64_needs_frame_chain (void)
+   return aarch64_use_frame_pointer;
+ }
+ 
++/* Return true if the current function should save registers above
++   the locals area, rather than below it.  */
++
++static bool
++aarch64_save_regs_above_locals_p ()
++{
++  /* When using stack smash protection, make sure that the canary slot
++     comes between the locals and the saved registers.  Otherwise,
++     it would be possible for a carefully sized smash attack to change
++     the saved registers (particularly LR and FP) without reaching the
++     canary.  */
++  return crtl->stack_protect_guard;
++}
++
+ /* Mark the registers that need to be saved by the callee and calculate
+    the size of the callee-saved registers area and frame record (both FP
+    and LR may be omitted).  */
+ static void
+ aarch64_layout_frame (void)
+ {
+-  poly_int64 offset = 0;
+   int regno, last_fp_reg = INVALID_REGNUM;
+   machine_mode vector_save_mode = aarch64_reg_save_mode (V8_REGNUM);
+   poly_int64 vector_save_size = GET_MODE_SIZE (vector_save_mode);
+   bool frame_related_fp_reg_p = false;
+   aarch64_frame &frame = cfun->machine->frame;
++  poly_int64 top_of_locals = -1;
+ 
+   frame.emit_frame_chain = aarch64_needs_frame_chain ();
+ 
+@@ -8148,11 +8162,18 @@ aarch64_layout_frame (void)
+ 	&& !crtl->abi->clobbers_full_reg_p (regno))
+       frame.reg_offset[regno] = SLOT_REQUIRED;
+ 
+-  /* With stack-clash, LR must be saved in non-leaf functions.  The saving of
+-     LR counts as an implicit probe which allows us to maintain the invariant
+-     described in the comment at expand_prologue.  */
+-  gcc_assert (crtl->is_leaf
+-	      || maybe_ne (frame.reg_offset[R30_REGNUM], SLOT_NOT_REQUIRED));
++  bool regs_at_top_p = aarch64_save_regs_above_locals_p ();
++
++  poly_int64 offset = crtl->outgoing_args_size;
++  gcc_assert (multiple_p (offset, STACK_BOUNDARY / BITS_PER_UNIT));
++  if (regs_at_top_p)
++    {
++      offset += get_frame_size ();
++      offset = aligned_upper_bound (offset, STACK_BOUNDARY / BITS_PER_UNIT);
++      top_of_locals = offset;
++    }
++  frame.bytes_below_saved_regs = offset;
++  frame.sve_save_and_probe = INVALID_REGNUM;
+ 
+   /* Now assign stack slots for the registers.  Start with the predicate
+      registers, since predicate LDR and STR have a relatively small
+@@ -8160,11 +8181,14 @@ aarch64_layout_frame (void)
+   for (regno = P0_REGNUM; regno <= P15_REGNUM; regno++)
+     if (known_eq (frame.reg_offset[regno], SLOT_REQUIRED))
+       {
++	if (frame.sve_save_and_probe == INVALID_REGNUM)
++	  frame.sve_save_and_probe = regno;
+ 	frame.reg_offset[regno] = offset;
+ 	offset += BYTES_PER_SVE_PRED;
+       }
+ 
+-  if (maybe_ne (offset, 0))
++  poly_int64 saved_prs_size = offset - frame.bytes_below_saved_regs;
++  if (maybe_ne (saved_prs_size, 0))
+     {
+       /* If we have any vector registers to save above the predicate registers,
+ 	 the offset of the vector register save slots need to be a multiple
+@@ -8182,10 +8206,10 @@ aarch64_layout_frame (void)
+ 	offset = aligned_upper_bound (offset, STACK_BOUNDARY / BITS_PER_UNIT);
+       else
+ 	{
+-	  if (known_le (offset, vector_save_size))
+-	    offset = vector_save_size;
+-	  else if (known_le (offset, vector_save_size * 2))
+-	    offset = vector_save_size * 2;
++	  if (known_le (saved_prs_size, vector_save_size))
++	    offset = frame.bytes_below_saved_regs + vector_save_size;
++	  else if (known_le (saved_prs_size, vector_save_size * 2))
++	    offset = frame.bytes_below_saved_regs + vector_save_size * 2;
+ 	  else
+ 	    gcc_unreachable ();
+ 	}
+@@ -8196,34 +8220,53 @@ aarch64_layout_frame (void)
+     for (regno = V0_REGNUM; regno <= V31_REGNUM; regno++)
+       if (known_eq (frame.reg_offset[regno], SLOT_REQUIRED))
+ 	{
++	  if (frame.sve_save_and_probe == INVALID_REGNUM)
++	    frame.sve_save_and_probe = regno;
+ 	  frame.reg_offset[regno] = offset;
+ 	  offset += vector_save_size;
+ 	}
+ 
+   /* OFFSET is now the offset of the hard frame pointer from the bottom
+      of the callee save area.  */
+-  bool saves_below_hard_fp_p = maybe_ne (offset, 0);
+-  frame.below_hard_fp_saved_regs_size = offset;
++  auto below_hard_fp_saved_regs_size = offset - frame.bytes_below_saved_regs;
++  bool saves_below_hard_fp_p = maybe_ne (below_hard_fp_saved_regs_size, 0);
++  gcc_assert (!saves_below_hard_fp_p
++	      || (frame.sve_save_and_probe != INVALID_REGNUM
++		  && known_eq (frame.reg_offset[frame.sve_save_and_probe],
++			       frame.bytes_below_saved_regs)));
++
++  frame.bytes_below_hard_fp = offset;
++  frame.hard_fp_save_and_probe = INVALID_REGNUM;
++
++  auto allocate_gpr_slot = [&](unsigned int regno)
++    {
++      if (frame.hard_fp_save_and_probe == INVALID_REGNUM)
++	frame.hard_fp_save_and_probe = regno;
++      frame.reg_offset[regno] = offset;
++      if (frame.wb_push_candidate1 == INVALID_REGNUM)
++	frame.wb_push_candidate1 = regno;
++      else if (frame.wb_push_candidate2 == INVALID_REGNUM)
++	frame.wb_push_candidate2 = regno;
++      offset += UNITS_PER_WORD;
++    };
++
+   if (frame.emit_frame_chain)
+     {
+       /* FP and LR are placed in the linkage record.  */
+-      frame.reg_offset[R29_REGNUM] = offset;
+-      frame.wb_push_candidate1 = R29_REGNUM;
+-      frame.reg_offset[R30_REGNUM] = offset + UNITS_PER_WORD;
+-      frame.wb_push_candidate2 = R30_REGNUM;
+-      offset += 2 * UNITS_PER_WORD;
++      allocate_gpr_slot (R29_REGNUM);
++      allocate_gpr_slot (R30_REGNUM);
+     }
++  else if (flag_stack_clash_protection
++	   && known_eq (frame.reg_offset[R30_REGNUM], SLOT_REQUIRED))
++    /* Put the LR save slot first, since it makes a good choice of probe
++       for stack clash purposes.  The idea is that the link register usually
++       has to be saved before a call anyway, and so we lose little by
++       stopping it from being individually shrink-wrapped.  */
++    allocate_gpr_slot (R30_REGNUM);
+ 
+   for (regno = R0_REGNUM; regno <= R30_REGNUM; regno++)
+     if (known_eq (frame.reg_offset[regno], SLOT_REQUIRED))
+-      {
+-	frame.reg_offset[regno] = offset;
+-	if (frame.wb_push_candidate1 == INVALID_REGNUM)
+-	  frame.wb_push_candidate1 = regno;
+-	else if (frame.wb_push_candidate2 == INVALID_REGNUM)
+-	  frame.wb_push_candidate2 = regno;
+-	offset += UNITS_PER_WORD;
+-      }
++      allocate_gpr_slot (regno);
+ 
+   poly_int64 max_int_offset = offset;
+   offset = aligned_upper_bound (offset, STACK_BOUNDARY / BITS_PER_UNIT);
+@@ -8232,6 +8275,8 @@ aarch64_layout_frame (void)
+   for (regno = V0_REGNUM; regno <= V31_REGNUM; regno++)
+     if (known_eq (frame.reg_offset[regno], SLOT_REQUIRED))
+       {
++	if (frame.hard_fp_save_and_probe == INVALID_REGNUM)
++	  frame.hard_fp_save_and_probe = regno;
+ 	/* If there is an alignment gap between integer and fp callee-saves,
+ 	   allocate the last fp register to it if possible.  */
+ 	if (regno == last_fp_reg
+@@ -8254,30 +8299,36 @@ aarch64_layout_frame (void)
+ 
+   offset = aligned_upper_bound (offset, STACK_BOUNDARY / BITS_PER_UNIT);
+ 
+-  frame.saved_regs_size = offset;
+-
+-  poly_int64 varargs_and_saved_regs_size = offset + frame.saved_varargs_size;
+-
+-  poly_int64 above_outgoing_args
+-    = aligned_upper_bound (varargs_and_saved_regs_size
+-			   + get_frame_size (),
+-			   STACK_BOUNDARY / BITS_PER_UNIT);
+-
+-  frame.hard_fp_offset
+-    = above_outgoing_args - frame.below_hard_fp_saved_regs_size;
+-
+-  /* Both these values are already aligned.  */
+-  gcc_assert (multiple_p (crtl->outgoing_args_size,
+-			  STACK_BOUNDARY / BITS_PER_UNIT));
+-  frame.frame_size = above_outgoing_args + crtl->outgoing_args_size;
+-
+-  frame.locals_offset = frame.saved_varargs_size;
++  auto saved_regs_size = offset - frame.bytes_below_saved_regs;
++  gcc_assert (known_eq (saved_regs_size, below_hard_fp_saved_regs_size)
++	      || (frame.hard_fp_save_and_probe != INVALID_REGNUM
++		  && known_eq (frame.reg_offset[frame.hard_fp_save_and_probe],
++			       frame.bytes_below_hard_fp)));
++
++  /* With stack-clash, a register must be saved in non-leaf functions.
++     The saving of the bottommost register counts as an implicit probe,
++     which allows us to maintain the invariant described in the comment
++     at expand_prologue.  */
++  gcc_assert (crtl->is_leaf || maybe_ne (saved_regs_size, 0));
++
++  if (!regs_at_top_p)
++    {
++      offset += get_frame_size ();
++      offset = aligned_upper_bound (offset, STACK_BOUNDARY / BITS_PER_UNIT);
++      top_of_locals = offset;
++    }
++  offset += frame.saved_varargs_size;
++  gcc_assert (multiple_p (offset, STACK_BOUNDARY / BITS_PER_UNIT));
++  frame.frame_size = offset;
++
++  frame.bytes_above_hard_fp = frame.frame_size - frame.bytes_below_hard_fp;
++  gcc_assert (known_ge (top_of_locals, 0));
++  frame.bytes_above_locals = frame.frame_size - top_of_locals;
+ 
+   frame.initial_adjust = 0;
+   frame.final_adjust = 0;
+   frame.callee_adjust = 0;
+   frame.sve_callee_adjust = 0;
+-  frame.callee_offset = 0;
+ 
+   frame.wb_pop_candidate1 = frame.wb_push_candidate1;
+   frame.wb_pop_candidate2 = frame.wb_push_candidate2;
+@@ -8288,7 +8339,7 @@ aarch64_layout_frame (void)
+   frame.is_scs_enabled
+     = (!crtl->calls_eh_return
+        && sanitize_flags_p (SANITIZE_SHADOW_CALL_STACK)
+-       && known_ge (cfun->machine->frame.reg_offset[LR_REGNUM], 0));
++       && known_ge (frame.reg_offset[LR_REGNUM], 0));
+ 
+   /* When shadow call stack is enabled, the scs_pop in the epilogue will
+      restore x30, and we don't need to pop x30 again in the traditional
+@@ -8308,75 +8359,76 @@ aarch64_layout_frame (void)
+      max_push_offset to 0, because no registers are popped at this time,
+      so callee_adjust cannot be adjusted.  */
+   HOST_WIDE_INT max_push_offset = 0;
+-  if (frame.wb_pop_candidate2 != INVALID_REGNUM)
+-    max_push_offset = 512;
+-  else if (frame.wb_pop_candidate1 != INVALID_REGNUM)
+-    max_push_offset = 256;
++  if (frame.wb_pop_candidate1 != INVALID_REGNUM)
++    {
++      if (frame.wb_pop_candidate2 != INVALID_REGNUM)
++	max_push_offset = 512;
++      else
++	max_push_offset = 256;
++    }
+ 
+-  HOST_WIDE_INT const_size, const_outgoing_args_size, const_fp_offset;
++  HOST_WIDE_INT const_size, const_below_saved_regs, const_above_fp;
+   HOST_WIDE_INT const_saved_regs_size;
+-  if (frame.frame_size.is_constant (&const_size)
+-      && const_size < max_push_offset
+-      && known_eq (frame.hard_fp_offset, const_size))
++  if (known_eq (saved_regs_size, 0))
++    frame.initial_adjust = frame.frame_size;
++  else if (frame.frame_size.is_constant (&const_size)
++	   && const_size < max_push_offset
++	   && known_eq (frame.bytes_above_hard_fp, const_size))
+     {
+-      /* Simple, small frame with no outgoing arguments:
++      /* Simple, small frame with no data below the saved registers.
+ 
+ 	 stp reg1, reg2, [sp, -frame_size]!
+ 	 stp reg3, reg4, [sp, 16]  */
+       frame.callee_adjust = const_size;
+     }
+-  else if (crtl->outgoing_args_size.is_constant (&const_outgoing_args_size)
+-	   && frame.saved_regs_size.is_constant (&const_saved_regs_size)
+-	   && const_outgoing_args_size + const_saved_regs_size < 512
+-	   /* We could handle this case even with outgoing args, provided
+-	      that the number of args left us with valid offsets for all
+-	      predicate and vector save slots.  It's such a rare case that
+-	      it hardly seems worth the effort though.  */
+-	   && (!saves_below_hard_fp_p || const_outgoing_args_size == 0)
++  else if (frame.bytes_below_saved_regs.is_constant (&const_below_saved_regs)
++	   && saved_regs_size.is_constant (&const_saved_regs_size)
++	   && const_below_saved_regs + const_saved_regs_size < 512
++	   /* We could handle this case even with data below the saved
++	      registers, provided that that data left us with valid offsets
++	      for all predicate and vector save slots.  It's such a rare
++	      case that it hardly seems worth the effort though.  */
++	   && (!saves_below_hard_fp_p || const_below_saved_regs == 0)
+ 	   && !(cfun->calls_alloca
+-		&& frame.hard_fp_offset.is_constant (&const_fp_offset)
+-		&& const_fp_offset < max_push_offset))
++		&& frame.bytes_above_hard_fp.is_constant (&const_above_fp)
++		&& const_above_fp < max_push_offset))
+     {
+-      /* Frame with small outgoing arguments:
++      /* Frame with small area below the saved registers:
+ 
+ 	 sub sp, sp, frame_size
+-	 stp reg1, reg2, [sp, outgoing_args_size]
+-	 stp reg3, reg4, [sp, outgoing_args_size + 16]  */
++	 stp reg1, reg2, [sp, bytes_below_saved_regs]
++	 stp reg3, reg4, [sp, bytes_below_saved_regs + 16]  */
+       frame.initial_adjust = frame.frame_size;
+-      frame.callee_offset = const_outgoing_args_size;
+     }
+   else if (saves_below_hard_fp_p
+-	   && known_eq (frame.saved_regs_size,
+-			frame.below_hard_fp_saved_regs_size))
++	   && known_eq (saved_regs_size, below_hard_fp_saved_regs_size))
+     {
+       /* Frame in which all saves are SVE saves:
+ 
+-	 sub sp, sp, hard_fp_offset + below_hard_fp_saved_regs_size
++	 sub sp, sp, frame_size - bytes_below_saved_regs
+ 	 save SVE registers relative to SP
+-	 sub sp, sp, outgoing_args_size  */
+-      frame.initial_adjust = (frame.hard_fp_offset
+-			      + frame.below_hard_fp_saved_regs_size);
+-      frame.final_adjust = crtl->outgoing_args_size;
++	 sub sp, sp, bytes_below_saved_regs  */
++      frame.initial_adjust = frame.frame_size - frame.bytes_below_saved_regs;
++      frame.final_adjust = frame.bytes_below_saved_regs;
+     }
+-  else if (frame.hard_fp_offset.is_constant (&const_fp_offset)
+-	   && const_fp_offset < max_push_offset)
++  else if (frame.bytes_above_hard_fp.is_constant (&const_above_fp)
++	   && const_above_fp < max_push_offset)
+     {
+-      /* Frame with large outgoing arguments or SVE saves, but with
+-	 a small local area:
++      /* Frame with large area below the saved registers, or with SVE saves,
++	 but with a small area above:
+ 
+ 	 stp reg1, reg2, [sp, -hard_fp_offset]!
+ 	 stp reg3, reg4, [sp, 16]
+ 	 [sub sp, sp, below_hard_fp_saved_regs_size]
+ 	 [save SVE registers relative to SP]
+-	 sub sp, sp, outgoing_args_size  */
+-      frame.callee_adjust = const_fp_offset;
+-      frame.sve_callee_adjust = frame.below_hard_fp_saved_regs_size;
+-      frame.final_adjust = crtl->outgoing_args_size;
++	 sub sp, sp, bytes_below_saved_regs  */
++      frame.callee_adjust = const_above_fp;
++      frame.sve_callee_adjust = below_hard_fp_saved_regs_size;
++      frame.final_adjust = frame.bytes_below_saved_regs;
+     }
+   else
+     {
+-      /* Frame with large local area and outgoing arguments or SVE saves,
+-	 using frame pointer:
++      /* General case:
+ 
+ 	 sub sp, sp, hard_fp_offset
+ 	 stp x29, x30, [sp, 0]
+@@ -8384,10 +8436,29 @@ aarch64_layout_frame (void)
+ 	 stp reg3, reg4, [sp, 16]
+ 	 [sub sp, sp, below_hard_fp_saved_regs_size]
+ 	 [save SVE registers relative to SP]
+-	 sub sp, sp, outgoing_args_size  */
+-      frame.initial_adjust = frame.hard_fp_offset;
+-      frame.sve_callee_adjust = frame.below_hard_fp_saved_regs_size;
+-      frame.final_adjust = crtl->outgoing_args_size;
++	 sub sp, sp, bytes_below_saved_regs  */
++      frame.initial_adjust = frame.bytes_above_hard_fp;
++      frame.sve_callee_adjust = below_hard_fp_saved_regs_size;
++      frame.final_adjust = frame.bytes_below_saved_regs;
++    }
++
++  /* The frame is allocated in pieces, with each non-final piece
++     including a register save at offset 0 that acts as a probe for
++     the following piece.  In addition, the save of the bottommost register
++     acts as a probe for callees and allocas.  Roll back any probes that
++     aren't needed.
++
++     A probe isn't needed if it is associated with the final allocation
++     (including callees and allocas) that happens before the epilogue is
++     executed.  */
++  if (crtl->is_leaf
++      && !cfun->calls_alloca
++      && known_eq (frame.final_adjust, 0))
++    {
++      if (maybe_ne (frame.sve_callee_adjust, 0))
++	frame.sve_save_and_probe = INVALID_REGNUM;
++      else
++	frame.hard_fp_save_and_probe = INVALID_REGNUM;
+     }
+ 
+   /* Make sure the individual adjustments add up to the full frame size.  */
+@@ -8691,15 +8762,17 @@ aarch64_add_cfa_expression (rtx_insn *in
+ }
+ 
+ /* Emit code to save the callee-saved registers from register number START
+-   to LIMIT to the stack at the location starting at offset START_OFFSET,
+-   skipping any write-back candidates if SKIP_WB is true.  HARD_FP_VALID_P
+-   is true if the hard frame pointer has been set up.  */
++   to LIMIT to the stack.  The stack pointer is currently BYTES_BELOW_SP
++   bytes above the bottom of the static frame.  Skip any write-back
++   candidates if SKIP_WB is true.  HARD_FP_VALID_P is true if the hard
++   frame pointer has been set up.  */
+ 
+ static void
+-aarch64_save_callee_saves (poly_int64 start_offset,
++aarch64_save_callee_saves (poly_int64 bytes_below_sp,
+ 			   unsigned start, unsigned limit, bool skip_wb,
+ 			   bool hard_fp_valid_p)
+ {
++  aarch64_frame &frame = cfun->machine->frame;
+   rtx_insn *insn;
+   unsigned regno;
+   unsigned regno2;
+@@ -8714,8 +8787,8 @@ aarch64_save_callee_saves (poly_int64 st
+       bool frame_related_p = aarch64_emit_cfi_for_reg_p (regno);
+ 
+       if (skip_wb
+-	  && (regno == cfun->machine->frame.wb_push_candidate1
+-	      || regno == cfun->machine->frame.wb_push_candidate2))
++	  && (regno == frame.wb_push_candidate1
++	      || regno == frame.wb_push_candidate2))
+ 	continue;
+ 
+       if (cfun->machine->reg_is_wrapped_separately[regno])
+@@ -8723,7 +8796,7 @@ aarch64_save_callee_saves (poly_int64 st
+ 
+       machine_mode mode = aarch64_reg_save_mode (regno);
+       reg = gen_rtx_REG (mode, regno);
+-      offset = start_offset + cfun->machine->frame.reg_offset[regno];
++      offset = frame.reg_offset[regno] - bytes_below_sp;
+       rtx base_rtx = stack_pointer_rtx;
+       poly_int64 sp_offset = offset;
+ 
+@@ -8734,9 +8807,7 @@ aarch64_save_callee_saves (poly_int64 st
+       else if (GP_REGNUM_P (regno)
+ 	       && (!offset.is_constant (&const_offset) || const_offset >= 512))
+ 	{
+-	  gcc_assert (known_eq (start_offset, 0));
+-	  poly_int64 fp_offset
+-	    = cfun->machine->frame.below_hard_fp_saved_regs_size;
++	  poly_int64 fp_offset = frame.bytes_below_hard_fp - bytes_below_sp;
+ 	  if (hard_fp_valid_p)
+ 	    base_rtx = hard_frame_pointer_rtx;
+ 	  else
+@@ -8758,8 +8829,7 @@ aarch64_save_callee_saves (poly_int64 st
+ 	  && (regno2 = aarch64_next_callee_save (regno + 1, limit)) <= limit
+ 	  && !cfun->machine->reg_is_wrapped_separately[regno2]
+ 	  && known_eq (GET_MODE_SIZE (mode),
+-		       cfun->machine->frame.reg_offset[regno2]
+-		       - cfun->machine->frame.reg_offset[regno]))
++		       frame.reg_offset[regno2] - frame.reg_offset[regno]))
+ 	{
+ 	  rtx reg2 = gen_rtx_REG (mode, regno2);
+ 	  rtx mem2;
+@@ -8801,14 +8871,16 @@ aarch64_save_callee_saves (poly_int64 st
+ }
+ 
+ /* Emit code to restore the callee registers from register number START
+-   up to and including LIMIT.  Restore from the stack offset START_OFFSET,
+-   skipping any write-back candidates if SKIP_WB is true.  Write the
+-   appropriate REG_CFA_RESTORE notes into CFI_OPS.  */
++   up to and including LIMIT.  The stack pointer is currently BYTES_BELOW_SP
++   bytes above the bottom of the static frame.  Skip any write-back
++   candidates if SKIP_WB is true.  Write the appropriate REG_CFA_RESTORE
++   notes into CFI_OPS.  */
+ 
+ static void
+-aarch64_restore_callee_saves (poly_int64 start_offset, unsigned start,
++aarch64_restore_callee_saves (poly_int64 bytes_below_sp, unsigned start,
+ 			      unsigned limit, bool skip_wb, rtx *cfi_ops)
+ {
++  aarch64_frame &frame = cfun->machine->frame;
+   unsigned regno;
+   unsigned regno2;
+   poly_int64 offset;
+@@ -8825,13 +8897,13 @@ aarch64_restore_callee_saves (poly_int64
+       rtx reg, mem;
+ 
+       if (skip_wb
+-	  && (regno == cfun->machine->frame.wb_pop_candidate1
+-	      || regno == cfun->machine->frame.wb_pop_candidate2))
++	  && (regno == frame.wb_pop_candidate1
++	      || regno == frame.wb_pop_candidate2))
+ 	continue;
+ 
+       machine_mode mode = aarch64_reg_save_mode (regno);
+       reg = gen_rtx_REG (mode, regno);
+-      offset = start_offset + cfun->machine->frame.reg_offset[regno];
++      offset = frame.reg_offset[regno] - bytes_below_sp;
+       rtx base_rtx = stack_pointer_rtx;
+       if (mode == VNx2DImode && BYTES_BIG_ENDIAN)
+ 	aarch64_adjust_sve_callee_save_base (mode, base_rtx, anchor_reg,
+@@ -8842,8 +8914,7 @@ aarch64_restore_callee_saves (poly_int64
+ 	  && (regno2 = aarch64_next_callee_save (regno + 1, limit)) <= limit
+ 	  && !cfun->machine->reg_is_wrapped_separately[regno2]
+ 	  && known_eq (GET_MODE_SIZE (mode),
+-		       cfun->machine->frame.reg_offset[regno2]
+-		       - cfun->machine->frame.reg_offset[regno]))
++		       frame.reg_offset[regno2] - frame.reg_offset[regno]))
+ 	{
+ 	  rtx reg2 = gen_rtx_REG (mode, regno2);
+ 	  rtx mem2;
+@@ -8948,6 +9019,7 @@ offset_12bit_unsigned_scaled_p (machine_
+ static sbitmap
+ aarch64_get_separate_components (void)
+ {
++  aarch64_frame &frame = cfun->machine->frame;
+   sbitmap components = sbitmap_alloc (LAST_SAVED_REGNUM + 1);
+   bitmap_clear (components);
+ 
+@@ -8964,20 +9036,11 @@ aarch64_get_separate_components (void)
+ 	if (mode == VNx2DImode && BYTES_BIG_ENDIAN)
+ 	  continue;
+ 
+-	poly_int64 offset = cfun->machine->frame.reg_offset[regno];
+-
+-	/* If the register is saved in the first SVE save slot, we use
+-	   it as a stack probe for -fstack-clash-protection.  */
+-	if (flag_stack_clash_protection
+-	    && maybe_ne (cfun->machine->frame.below_hard_fp_saved_regs_size, 0)
+-	    && known_eq (offset, 0))
+-	  continue;
++	poly_int64 offset = frame.reg_offset[regno];
+ 
+ 	/* Get the offset relative to the register we'll use.  */
+ 	if (frame_pointer_needed)
+-	  offset -= cfun->machine->frame.below_hard_fp_saved_regs_size;
+-	else
+-	  offset += crtl->outgoing_args_size;
++	  offset -= frame.bytes_below_hard_fp;
+ 
+ 	/* Check that we can access the stack slot of the register with one
+ 	   direct load with no adjustments needed.  */
+@@ -8994,11 +9057,11 @@ aarch64_get_separate_components (void)
+   /* If the spare predicate register used by big-endian SVE code
+      is call-preserved, it must be saved in the main prologue
+      before any saves that use it.  */
+-  if (cfun->machine->frame.spare_pred_reg != INVALID_REGNUM)
+-    bitmap_clear_bit (components, cfun->machine->frame.spare_pred_reg);
++  if (frame.spare_pred_reg != INVALID_REGNUM)
++    bitmap_clear_bit (components, frame.spare_pred_reg);
+ 
+-  unsigned reg1 = cfun->machine->frame.wb_push_candidate1;
+-  unsigned reg2 = cfun->machine->frame.wb_push_candidate2;
++  unsigned reg1 = frame.wb_push_candidate1;
++  unsigned reg2 = frame.wb_push_candidate2;
+   /* If registers have been chosen to be stored/restored with
+      writeback don't interfere with them to avoid having to output explicit
+      stack adjustment instructions.  */
+@@ -9009,6 +9072,13 @@ aarch64_get_separate_components (void)
+ 
+   bitmap_clear_bit (components, LR_REGNUM);
+   bitmap_clear_bit (components, SP_REGNUM);
++  if (flag_stack_clash_protection)
++    {
++      if (frame.sve_save_and_probe != INVALID_REGNUM)
++	bitmap_clear_bit (components, frame.sve_save_and_probe);
++      if (frame.hard_fp_save_and_probe != INVALID_REGNUM)
++	bitmap_clear_bit (components, frame.hard_fp_save_and_probe);
++    }
+ 
+   return components;
+ }
+@@ -9107,6 +9177,7 @@ aarch64_get_next_set_bit (sbitmap bmp, u
+ static void
+ aarch64_process_components (sbitmap components, bool prologue_p)
+ {
++  aarch64_frame &frame = cfun->machine->frame;
+   rtx ptr_reg = gen_rtx_REG (Pmode, frame_pointer_needed
+ 			     ? HARD_FRAME_POINTER_REGNUM
+ 			     : STACK_POINTER_REGNUM);
+@@ -9121,11 +9192,9 @@ aarch64_process_components (sbitmap comp
+       machine_mode mode = aarch64_reg_save_mode (regno);
+       
+       rtx reg = gen_rtx_REG (mode, regno);
+-      poly_int64 offset = cfun->machine->frame.reg_offset[regno];
++      poly_int64 offset = frame.reg_offset[regno];
+       if (frame_pointer_needed)
+-	offset -= cfun->machine->frame.below_hard_fp_saved_regs_size;
+-      else
+-	offset += crtl->outgoing_args_size;
++	offset -= frame.bytes_below_hard_fp;
+ 
+       rtx addr = plus_constant (Pmode, ptr_reg, offset);
+       rtx mem = gen_frame_mem (mode, addr);
+@@ -9148,14 +9217,14 @@ aarch64_process_components (sbitmap comp
+ 	  break;
+ 	}
+ 
+-      poly_int64 offset2 = cfun->machine->frame.reg_offset[regno2];
++      poly_int64 offset2 = frame.reg_offset[regno2];
+       /* The next register is not of the same class or its offset is not
+ 	 mergeable with the current one into a pair.  */
+       if (aarch64_sve_mode_p (mode)
+ 	  || !satisfies_constraint_Ump (mem)
+ 	  || GP_REGNUM_P (regno) != GP_REGNUM_P (regno2)
+ 	  || (crtl->abi->id () == ARM_PCS_SIMD && FP_REGNUM_P (regno))
+-	  || maybe_ne ((offset2 - cfun->machine->frame.reg_offset[regno]),
++	  || maybe_ne ((offset2 - frame.reg_offset[regno]),
+ 		       GET_MODE_SIZE (mode)))
+ 	{
+ 	  insn = emit_insn (set);
+@@ -9177,9 +9246,7 @@ aarch64_process_components (sbitmap comp
+       /* REGNO2 can be saved/restored in a pair with REGNO.  */
+       rtx reg2 = gen_rtx_REG (mode, regno2);
+       if (frame_pointer_needed)
+-	offset2 -= cfun->machine->frame.below_hard_fp_saved_regs_size;
+-      else
+-	offset2 += crtl->outgoing_args_size;
++	offset2 -= frame.bytes_below_hard_fp;
+       rtx addr2 = plus_constant (Pmode, ptr_reg, offset2);
+       rtx mem2 = gen_frame_mem (mode, addr2);
+       rtx set2 = prologue_p ? gen_rtx_SET (mem2, reg2)
+@@ -9253,10 +9320,10 @@ aarch64_stack_clash_protection_alloca_pr
+    registers.  If POLY_SIZE is not large enough to require a probe this function
+    will only adjust the stack.  When allocating the stack space
+    FRAME_RELATED_P is then used to indicate if the allocation is frame related.
+-   FINAL_ADJUSTMENT_P indicates whether we are allocating the outgoing
+-   arguments.  If we are then we ensure that any allocation larger than the ABI
+-   defined buffer needs a probe so that the invariant of having a 1KB buffer is
+-   maintained.
++   FINAL_ADJUSTMENT_P indicates whether we are allocating the area below
++   the saved registers.  If we are then we ensure that any allocation
++   larger than the ABI defined buffer needs a probe so that the
++   invariant of having a 1KB buffer is maintained.
+ 
+    We emit barriers after each stack adjustment to prevent optimizations from
+    breaking the invariant that we never drop the stack more than a page.  This
+@@ -9272,45 +9339,26 @@ aarch64_allocate_and_probe_stack_space (
+ 					bool frame_related_p,
+ 					bool final_adjustment_p)
+ {
++  aarch64_frame &frame = cfun->machine->frame;
+   HOST_WIDE_INT guard_size
+     = 1 << param_stack_clash_protection_guard_size;
+   HOST_WIDE_INT guard_used_by_caller = STACK_CLASH_CALLER_GUARD;
++  HOST_WIDE_INT byte_sp_alignment = STACK_BOUNDARY / BITS_PER_UNIT;
++  gcc_assert (multiple_p (poly_size, byte_sp_alignment));
+   HOST_WIDE_INT min_probe_threshold
+     = (final_adjustment_p
+-       ? guard_used_by_caller
++       ? guard_used_by_caller + byte_sp_alignment
+        : guard_size - guard_used_by_caller);
+-  /* When doing the final adjustment for the outgoing arguments, take into
+-     account any unprobed space there is above the current SP.  There are
+-     two cases:
+-
+-     - When saving SVE registers below the hard frame pointer, we force
+-       the lowest save to take place in the prologue before doing the final
+-       adjustment (i.e. we don't allow the save to be shrink-wrapped).
+-       This acts as a probe at SP, so there is no unprobed space.
+-
+-     - When there are no SVE register saves, we use the store of the link
+-       register as a probe.  We can't assume that LR was saved at position 0
+-       though, so treat any space below it as unprobed.  */
+-  if (final_adjustment_p
+-      && known_eq (cfun->machine->frame.below_hard_fp_saved_regs_size, 0))
+-    {
+-      poly_int64 lr_offset = cfun->machine->frame.reg_offset[LR_REGNUM];
+-      if (known_ge (lr_offset, 0))
+-	min_probe_threshold -= lr_offset.to_constant ();
+-      else
+-	gcc_assert (!flag_stack_clash_protection || known_eq (poly_size, 0));
+-    }
+-
+-  poly_int64 frame_size = cfun->machine->frame.frame_size;
++  poly_int64 frame_size = frame.frame_size;
+ 
+   /* We should always have a positive probe threshold.  */
+   gcc_assert (min_probe_threshold > 0);
+ 
+   if (flag_stack_clash_protection && !final_adjustment_p)
+     {
+-      poly_int64 initial_adjust = cfun->machine->frame.initial_adjust;
+-      poly_int64 sve_callee_adjust = cfun->machine->frame.sve_callee_adjust;
+-      poly_int64 final_adjust = cfun->machine->frame.final_adjust;
++      poly_int64 initial_adjust = frame.initial_adjust;
++      poly_int64 sve_callee_adjust = frame.sve_callee_adjust;
++      poly_int64 final_adjust = frame.final_adjust;
+ 
+       if (known_eq (frame_size, 0))
+ 	{
+@@ -9464,7 +9512,7 @@ aarch64_allocate_and_probe_stack_space (
+   /* Handle any residuals.  Residuals of at least MIN_PROBE_THRESHOLD have to
+      be probed.  This maintains the requirement that each page is probed at
+      least once.  For initial probing we probe only if the allocation is
+-     more than GUARD_SIZE - buffer, and for the outgoing arguments we probe
++     more than GUARD_SIZE - buffer, and below the saved registers we probe
+      if the amount is larger than buffer.  GUARD_SIZE - buffer + buffer ==
+      GUARD_SIZE.  This works that for any allocation that is large enough to
+      trigger a probe here, we'll have at least one, and if they're not large
+@@ -9474,16 +9522,12 @@ aarch64_allocate_and_probe_stack_space (
+      are still safe.  */
+   if (residual)
+     {
+-      HOST_WIDE_INT residual_probe_offset = guard_used_by_caller;
++      gcc_assert (guard_used_by_caller + byte_sp_alignment <= size);
++
+       /* If we're doing final adjustments, and we've done any full page
+ 	 allocations then any residual needs to be probed.  */
+       if (final_adjustment_p && rounded_size != 0)
+ 	min_probe_threshold = 0;
+-      /* If doing a small final adjustment, we always probe at offset 0.
+-	 This is done to avoid issues when LR is not at position 0 or when
+-	 the final adjustment is smaller than the probing offset.  */
+-      else if (final_adjustment_p && rounded_size == 0)
+-	residual_probe_offset = 0;
+ 
+       aarch64_sub_sp (temp1, temp2, residual, frame_related_p);
+       if (residual >= min_probe_threshold)
+@@ -9494,8 +9538,8 @@ aarch64_allocate_and_probe_stack_space (
+ 		     HOST_WIDE_INT_PRINT_DEC " bytes, probing will be required."
+ 		     "\n", residual);
+ 
+-	    emit_stack_probe (plus_constant (Pmode, stack_pointer_rtx,
+-					     residual_probe_offset));
++	  emit_stack_probe (plus_constant (Pmode, stack_pointer_rtx,
++					   guard_used_by_caller));
+ 	  emit_insn (gen_blockage ());
+ 	}
+     }
+@@ -9533,20 +9577,24 @@ aarch64_epilogue_uses (int regno)
+ 	|  for register varargs         |
+ 	|                               |
+ 	+-------------------------------+
+-	|  local variables              | <-- frame_pointer_rtx
++	|  local variables (1)          | <-- frame_pointer_rtx
+ 	|                               |
+ 	+-------------------------------+
+-	|  padding                      | \
+-	+-------------------------------+  |
+-	|  callee-saved registers       |  | frame.saved_regs_size
+-	+-------------------------------+  |
+-	|  LR'                          |  |
+-	+-------------------------------+  |
+-	|  FP'                          |  |
+-	+-------------------------------+  |<- hard_frame_pointer_rtx (aligned)
+-	|  SVE vector registers         |  | \
+-	+-------------------------------+  |  | below_hard_fp_saved_regs_size
+-	|  SVE predicate registers      | /  /
++	|  padding (1)                  |
++	+-------------------------------+
++	|  callee-saved registers       |
++	+-------------------------------+
++	|  LR'                          |
++	+-------------------------------+
++	|  FP'                          |
++	+-------------------------------+ <-- hard_frame_pointer_rtx (aligned)
++	|  SVE vector registers         |
++	+-------------------------------+
++	|  SVE predicate registers      |
++	+-------------------------------+
++	|  local variables (2)          |
++	+-------------------------------+
++	|  padding (2)                  |
+ 	+-------------------------------+
+ 	|  dynamic allocation           |
+ 	+-------------------------------+
+@@ -9557,6 +9605,9 @@ aarch64_epilogue_uses (int regno)
+ 	+-------------------------------+
+ 	|                               | <-- stack_pointer_rtx (aligned)
+ 
++   The regions marked (1) and (2) are mutually exclusive.  (2) is used
++   when aarch64_save_regs_above_locals_p is true.
++
+    Dynamic stack allocations via alloca() decrease stack_pointer_rtx
+    but leave frame_pointer_rtx and hard_frame_pointer_rtx
+    unchanged.
+@@ -9571,8 +9622,8 @@ aarch64_epilogue_uses (int regno)
+    When probing is needed, we emit a probe at the start of the prologue
+    and every PARAM_STACK_CLASH_PROTECTION_GUARD_SIZE bytes thereafter.
+ 
+-   We have to track how much space has been allocated and the only stores
+-   to the stack we track as implicit probes are the FP/LR stores.
++   We can also use register saves as probes.  These are stored in
++   sve_save_and_probe and hard_fp_save_and_probe.
+ 
+    For outgoing arguments we probe if the size is larger than 1KB, such that
+    the ABI specified buffer is maintained for the next callee.
+@@ -9599,17 +9650,15 @@ aarch64_epilogue_uses (int regno)
+ void
+ aarch64_expand_prologue (void)
+ {
+-  poly_int64 frame_size = cfun->machine->frame.frame_size;
+-  poly_int64 initial_adjust = cfun->machine->frame.initial_adjust;
+-  HOST_WIDE_INT callee_adjust = cfun->machine->frame.callee_adjust;
+-  poly_int64 final_adjust = cfun->machine->frame.final_adjust;
+-  poly_int64 callee_offset = cfun->machine->frame.callee_offset;
+-  poly_int64 sve_callee_adjust = cfun->machine->frame.sve_callee_adjust;
+-  poly_int64 below_hard_fp_saved_regs_size
+-    = cfun->machine->frame.below_hard_fp_saved_regs_size;
+-  unsigned reg1 = cfun->machine->frame.wb_push_candidate1;
+-  unsigned reg2 = cfun->machine->frame.wb_push_candidate2;
+-  bool emit_frame_chain = cfun->machine->frame.emit_frame_chain;
++  aarch64_frame &frame = cfun->machine->frame;
++  poly_int64 frame_size = frame.frame_size;
++  poly_int64 initial_adjust = frame.initial_adjust;
++  HOST_WIDE_INT callee_adjust = frame.callee_adjust;
++  poly_int64 final_adjust = frame.final_adjust;
++  poly_int64 sve_callee_adjust = frame.sve_callee_adjust;
++  unsigned reg1 = frame.wb_push_candidate1;
++  unsigned reg2 = frame.wb_push_candidate2;
++  bool emit_frame_chain = frame.emit_frame_chain;
+   rtx_insn *insn;
+ 
+   if (flag_stack_clash_protection && known_eq (callee_adjust, 0))
+@@ -9640,7 +9689,7 @@ aarch64_expand_prologue (void)
+     }
+ 
+   /* Push return address to shadow call stack.  */
+-  if (cfun->machine->frame.is_scs_enabled)
++  if (frame.is_scs_enabled)
+     emit_insn (gen_scs_push ());
+ 
+   if (flag_stack_usage_info)
+@@ -9677,21 +9726,21 @@ aarch64_expand_prologue (void)
+   if (callee_adjust != 0)
+     aarch64_push_regs (reg1, reg2, callee_adjust);
+ 
+-  /* The offset of the frame chain record (if any) from the current SP.  */
+-  poly_int64 chain_offset = (initial_adjust + callee_adjust
+-			     - cfun->machine->frame.hard_fp_offset);
+-  gcc_assert (known_ge (chain_offset, 0));
+-
+-  /* The offset of the bottom of the save area from the current SP.  */
+-  poly_int64 saved_regs_offset = chain_offset - below_hard_fp_saved_regs_size;
++  /* The offset of the current SP from the bottom of the static frame.  */
++  poly_int64 bytes_below_sp = frame_size - initial_adjust - callee_adjust;
+ 
+   if (emit_frame_chain)
+     {
++      /* The offset of the frame chain record (if any) from the current SP.  */
++      poly_int64 chain_offset = (initial_adjust + callee_adjust
++				 - frame.bytes_above_hard_fp);
++      gcc_assert (known_ge (chain_offset, 0));
++
+       if (callee_adjust == 0)
+ 	{
+ 	  reg1 = R29_REGNUM;
+ 	  reg2 = R30_REGNUM;
+-	  aarch64_save_callee_saves (saved_regs_offset, reg1, reg2,
++	  aarch64_save_callee_saves (bytes_below_sp, reg1, reg2,
+ 				     false, false);
+ 	}
+       else
+@@ -9716,8 +9765,7 @@ aarch64_expand_prologue (void)
+ 	     implicit.  */
+ 	  if (!find_reg_note (insn, REG_CFA_ADJUST_CFA, NULL_RTX))
+ 	    {
+-	      rtx src = plus_constant (Pmode, stack_pointer_rtx,
+-				       callee_offset);
++	      rtx src = plus_constant (Pmode, stack_pointer_rtx, chain_offset);
+ 	      add_reg_note (insn, REG_CFA_ADJUST_CFA,
+ 			    gen_rtx_SET (hard_frame_pointer_rtx, src));
+ 	    }
+@@ -9732,7 +9780,7 @@ aarch64_expand_prologue (void)
+       emit_insn (gen_stack_tie (stack_pointer_rtx, hard_frame_pointer_rtx));
+     }
+ 
+-  aarch64_save_callee_saves (saved_regs_offset, R0_REGNUM, R30_REGNUM,
++  aarch64_save_callee_saves (bytes_below_sp, R0_REGNUM, R30_REGNUM,
+ 			     callee_adjust != 0 || emit_frame_chain,
+ 			     emit_frame_chain);
+   if (maybe_ne (sve_callee_adjust, 0))
+@@ -9742,18 +9790,21 @@ aarch64_expand_prologue (void)
+       aarch64_allocate_and_probe_stack_space (tmp1_rtx, tmp0_rtx,
+ 					      sve_callee_adjust,
+ 					      !frame_pointer_needed, false);
+-      saved_regs_offset += sve_callee_adjust;
++      bytes_below_sp -= sve_callee_adjust;
+     }
+-  aarch64_save_callee_saves (saved_regs_offset, P0_REGNUM, P15_REGNUM,
++  aarch64_save_callee_saves (bytes_below_sp, P0_REGNUM, P15_REGNUM,
+ 			     false, emit_frame_chain);
+-  aarch64_save_callee_saves (saved_regs_offset, V0_REGNUM, V31_REGNUM,
++  aarch64_save_callee_saves (bytes_below_sp, V0_REGNUM, V31_REGNUM,
+ 			     callee_adjust != 0 || emit_frame_chain,
+ 			     emit_frame_chain);
+ 
+   /* We may need to probe the final adjustment if it is larger than the guard
+      that is assumed by the called.  */
++  gcc_assert (known_eq (bytes_below_sp, final_adjust));
+   aarch64_allocate_and_probe_stack_space (tmp1_rtx, tmp0_rtx, final_adjust,
+ 					  !frame_pointer_needed, true);
++  if (emit_frame_chain && maybe_ne (final_adjust, 0))
++    emit_insn (gen_stack_tie (stack_pointer_rtx, hard_frame_pointer_rtx));
+ }
+ 
+ /* Return TRUE if we can use a simple_return insn.
+@@ -9782,16 +9833,15 @@ aarch64_use_return_insn_p (void)
+ void
+ aarch64_expand_epilogue (bool for_sibcall)
+ {
+-  poly_int64 initial_adjust = cfun->machine->frame.initial_adjust;
+-  HOST_WIDE_INT callee_adjust = cfun->machine->frame.callee_adjust;
+-  poly_int64 final_adjust = cfun->machine->frame.final_adjust;
+-  poly_int64 callee_offset = cfun->machine->frame.callee_offset;
+-  poly_int64 sve_callee_adjust = cfun->machine->frame.sve_callee_adjust;
+-  poly_int64 below_hard_fp_saved_regs_size
+-    = cfun->machine->frame.below_hard_fp_saved_regs_size;
+-  unsigned reg1 = cfun->machine->frame.wb_pop_candidate1;
+-  unsigned reg2 = cfun->machine->frame.wb_pop_candidate2;
+-  unsigned int last_gpr = (cfun->machine->frame.is_scs_enabled
++  aarch64_frame &frame = cfun->machine->frame;
++  poly_int64 initial_adjust = frame.initial_adjust;
++  HOST_WIDE_INT callee_adjust = frame.callee_adjust;
++  poly_int64 final_adjust = frame.final_adjust;
++  poly_int64 sve_callee_adjust = frame.sve_callee_adjust;
++  poly_int64 bytes_below_hard_fp = frame.bytes_below_hard_fp;
++  unsigned reg1 = frame.wb_pop_candidate1;
++  unsigned reg2 = frame.wb_pop_candidate2;
++  unsigned int last_gpr = (frame.is_scs_enabled
+ 			   ? R29_REGNUM : R30_REGNUM);
+   rtx cfi_ops = NULL;
+   rtx_insn *insn;
+@@ -9825,7 +9875,7 @@ aarch64_expand_epilogue (bool for_sibcal
+   /* We need to add memory barrier to prevent read from deallocated stack.  */
+   bool need_barrier_p
+     = maybe_ne (get_frame_size ()
+-		+ cfun->machine->frame.saved_varargs_size, 0);
++		+ frame.saved_varargs_size, 0);
+ 
+   /* Emit a barrier to prevent loads from a deallocated stack.  */
+   if (maybe_gt (final_adjust, crtl->outgoing_args_size)
+@@ -9846,7 +9896,7 @@ aarch64_expand_epilogue (bool for_sibcal
+        is restored on the instruction doing the writeback.  */
+     aarch64_add_offset (Pmode, stack_pointer_rtx,
+ 			hard_frame_pointer_rtx,
+-			-callee_offset - below_hard_fp_saved_regs_size,
++			-bytes_below_hard_fp + final_adjust,
+ 			tmp1_rtx, tmp0_rtx, callee_adjust == 0);
+   else
+      /* The case where we need to re-use the register here is very rare, so
+@@ -9856,9 +9906,9 @@ aarch64_expand_epilogue (bool for_sibcal
+ 
+   /* Restore the vector registers before the predicate registers,
+      so that we can use P4 as a temporary for big-endian SVE frames.  */
+-  aarch64_restore_callee_saves (callee_offset, V0_REGNUM, V31_REGNUM,
++  aarch64_restore_callee_saves (final_adjust, V0_REGNUM, V31_REGNUM,
+ 				callee_adjust != 0, &cfi_ops);
+-  aarch64_restore_callee_saves (callee_offset, P0_REGNUM, P15_REGNUM,
++  aarch64_restore_callee_saves (final_adjust, P0_REGNUM, P15_REGNUM,
+ 				false, &cfi_ops);
+   if (maybe_ne (sve_callee_adjust, 0))
+     aarch64_add_sp (NULL_RTX, NULL_RTX, sve_callee_adjust, true);
+@@ -9866,7 +9916,7 @@ aarch64_expand_epilogue (bool for_sibcal
+   /* When shadow call stack is enabled, the scs_pop in the epilogue will
+      restore x30, we don't need to restore x30 again in the traditional
+      way.  */
+-  aarch64_restore_callee_saves (callee_offset - sve_callee_adjust,
++  aarch64_restore_callee_saves (final_adjust + sve_callee_adjust,
+ 				R0_REGNUM, last_gpr,
+ 				callee_adjust != 0, &cfi_ops);
+ 
+@@ -9906,7 +9956,7 @@ aarch64_expand_epilogue (bool for_sibcal
+     }
+ 
+   /* Pop return address from shadow call stack.  */
+-  if (cfun->machine->frame.is_scs_enabled)
++  if (frame.is_scs_enabled)
+     {
+       machine_mode mode = aarch64_reg_save_mode (R30_REGNUM);
+       rtx reg = gen_rtx_REG (mode, R30_REGNUM);
+@@ -12501,24 +12551,24 @@ aarch64_can_eliminate (const int from AT
+ poly_int64
+ aarch64_initial_elimination_offset (unsigned from, unsigned to)
+ {
++  aarch64_frame &frame = cfun->machine->frame;
++
+   if (to == HARD_FRAME_POINTER_REGNUM)
+     {
+       if (from == ARG_POINTER_REGNUM)
+-	return cfun->machine->frame.hard_fp_offset;
++	return frame.bytes_above_hard_fp;
+ 
+       if (from == FRAME_POINTER_REGNUM)
+-	return cfun->machine->frame.hard_fp_offset
+-	       - cfun->machine->frame.locals_offset;
++	return frame.bytes_above_hard_fp - frame.bytes_above_locals;
+     }
+ 
+   if (to == STACK_POINTER_REGNUM)
+     {
+       if (from == FRAME_POINTER_REGNUM)
+-	  return cfun->machine->frame.frame_size
+-		 - cfun->machine->frame.locals_offset;
++	return frame.frame_size - frame.bytes_above_locals;
+     }
+ 
+-  return cfun->machine->frame.frame_size;
++  return frame.frame_size;
+ }
+ 
+ 
+@@ -25795,11 +25845,9 @@ aarch64_operands_ok_for_ldpstp (rtx *ope
+   gcc_assert (known_eq (GET_MODE_SIZE (GET_MODE (mem_1)),
+ 			GET_MODE_SIZE (GET_MODE (mem_2))));
+ 
+-  /* One of the memory accesses must be a mempair operand.
+-     If it is not the first one, they need to be swapped by the
+-     peephole.  */
+-  if (!aarch64_mem_pair_operand (mem_1, GET_MODE (mem_1))
+-       && !aarch64_mem_pair_operand (mem_2, GET_MODE (mem_2)))
++  /* The lower memory access must be a mem-pair operand.  */
++  rtx lower_mem = reversed ? mem_2 : mem_1;
++  if (!aarch64_mem_pair_operand (lower_mem, GET_MODE (lower_mem)))
+     return false;
+ 
+   if (REG_P (reg_1) && FP_REGNUM_P (REGNO (reg_1)))
+Index: gcc-12-12.2.0/src/gcc/config/aarch64/aarch64.h
+===================================================================
+--- gcc-12-12.2.0.orig/src/gcc/config/aarch64/aarch64.h
++++ gcc-12-12.2.0/src/gcc/config/aarch64/aarch64.h
+@@ -862,6 +862,9 @@ extern enum aarch64_processor aarch64_tu
+ #ifdef HAVE_POLY_INT_H
+ struct GTY (()) aarch64_frame
+ {
++  /* The offset from the bottom of the static frame (the bottom of the
++     outgoing arguments) of each register save slot, or -2 if no save is
++     needed.  */
+   poly_int64 reg_offset[LAST_SAVED_REGNUM + 1];
+ 
+   /* The number of extra stack bytes taken up by register varargs.
+@@ -870,25 +873,28 @@ struct GTY (()) aarch64_frame
+      STACK_BOUNDARY.  */
+   HOST_WIDE_INT saved_varargs_size;
+ 
+-  /* The size of the callee-save registers with a slot in REG_OFFSET.  */
+-  poly_int64 saved_regs_size;
++  /* The number of bytes between the bottom of the static frame (the bottom
++     of the outgoing arguments) and the bottom of the register save area.
++     This value is always a multiple of STACK_BOUNDARY.  */
++  poly_int64 bytes_below_saved_regs;
++
++  /* The number of bytes between the bottom of the static frame (the bottom
++     of the outgoing arguments) and the hard frame pointer.  This value is
++     always a multiple of STACK_BOUNDARY.  */
++  poly_int64 bytes_below_hard_fp;
+ 
+-  /* The size of the callee-save registers with a slot in REG_OFFSET that
+-     are saved below the hard frame pointer.  */
+-  poly_int64 below_hard_fp_saved_regs_size;
+-
+-  /* Offset from the base of the frame (incomming SP) to the
+-     top of the locals area.  This value is always a multiple of
++  /* The number of bytes between the top of the locals area and the top
++     of the frame (the incomming SP).  This value is always a multiple of
+      STACK_BOUNDARY.  */
+-  poly_int64 locals_offset;
++  poly_int64 bytes_above_locals;
+ 
+-  /* Offset from the base of the frame (incomming SP) to the
+-     hard_frame_pointer.  This value is always a multiple of
++  /* The number of bytes between the hard_frame_pointer and the top of
++     the frame (the incomming SP).  This value is always a multiple of
+      STACK_BOUNDARY.  */
+-  poly_int64 hard_fp_offset;
++  poly_int64 bytes_above_hard_fp;
+ 
+-  /* The size of the frame.  This value is the offset from base of the
+-     frame (incomming SP) to the stack_pointer.  This value is always
++  /* The size of the frame, i.e. the number of bytes between the bottom
++     of the outgoing arguments and the incoming SP.  This value is always
+      a multiple of STACK_BOUNDARY.  */
+   poly_int64 frame_size;
+ 
+@@ -899,10 +905,6 @@ struct GTY (()) aarch64_frame
+      It is zero when no push is used.  */
+   HOST_WIDE_INT callee_adjust;
+ 
+-  /* The offset from SP to the callee-save registers after initial_adjust.
+-     It may be non-zero if no push is used (ie. callee_adjust == 0).  */
+-  poly_int64 callee_offset;
+-
+   /* The size of the stack adjustment before saving or after restoring
+      SVE registers.  */
+   poly_int64 sve_callee_adjust;
+@@ -950,6 +952,14 @@ struct GTY (()) aarch64_frame
+      This is the register they should use.  */
+   unsigned spare_pred_reg;
+ 
++  /* An SVE register that is saved below the hard frame pointer and that acts
++     as a probe for later allocations, or INVALID_REGNUM if none.  */
++  unsigned sve_save_and_probe;
++
++  /* A register that is saved at the hard frame pointer and that acts
++     as a probe for later allocations, or INVALID_REGNUM if none.  */
++  unsigned hard_fp_save_and_probe;
++
+   bool laid_out;
+ 
+   /* True if shadow call stack should be enabled for the current function.  */
+Index: gcc-12-12.2.0/src/gcc/testsuite/gcc.target/aarch64/stack-check-prologue-17.c
+===================================================================
+--- /dev/null
++++ gcc-12-12.2.0/src/gcc/testsuite/gcc.target/aarch64/stack-check-prologue-17.c
+@@ -0,0 +1,55 @@
++/* { dg-options "-O2 -fstack-clash-protection -fomit-frame-pointer --param stack-clash-protection-guard-size=12" } */
++/* { dg-final { check-function-bodies "**" "" } } */
++
++void f(int, ...);
++void g();
++
++/*
++** test1:
++**	...
++**	str	x30, \[sp\]
++**	sub	sp, sp, #1024
++**	cbnz	w0, .*
++**	bl	g
++**	...
++*/
++int test1(int z) {
++  __uint128_t x = 0;
++  int y[0x400];
++  if (z)
++    {
++      f(0, 0, 0, 0, 0, 0, 0, &y,
++	x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x,
++	x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x,
++	x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x,
++	x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x);
++    }
++  g();
++  return 1;
++}
++
++/*
++** test2:
++**	...
++**	str	x30, \[sp\]
++**	sub	sp, sp, #1040
++**	str	xzr, \[sp, #?1024\]
++**	cbnz	w0, .*
++**	bl	g
++**	...
++*/
++int test2(int z) {
++  __uint128_t x = 0;
++  int y[0x400];
++  if (z)
++    {
++      f(0, 0, 0, 0, 0, 0, 0, &y,
++	x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x,
++	x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x,
++	x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x,
++	x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x,
++	x);
++    }
++  g();
++  return 1;
++}
+Index: gcc-12-12.2.0/src/gcc/testsuite/gcc.target/aarch64/stack-check-prologue-18.c
+===================================================================
+--- /dev/null
++++ gcc-12-12.2.0/src/gcc/testsuite/gcc.target/aarch64/stack-check-prologue-18.c
+@@ -0,0 +1,100 @@
++/* { dg-options "-O2 -fstack-clash-protection -fomit-frame-pointer --param stack-clash-protection-guard-size=12" } */
++/* { dg-final { check-function-bodies "**" "" } } */
++
++void f(int, ...);
++void g();
++
++/*
++** test1:
++**	...
++**	str	x30, \[sp\]
++**	sub	sp, sp, #4064
++**	str	xzr, \[sp, #?1024\]
++**	cbnz	w0, .*
++**	bl	g
++**	...
++**	str	x26, \[sp, #?4128\]
++**	...
++*/
++int test1(int z) {
++  __uint128_t x = 0;
++  int y[0x400];
++  if (z)
++    {
++      asm volatile ("" :::
++		    "x19", "x20", "x21", "x22", "x23", "x24", "x25", "x26");
++      f(0, 0, 0, 0, 0, 0, 0, &y,
++	x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x,
++	x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x,
++	x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x,
++	x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x,
++	x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x,
++	x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x,
++	x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x,
++	x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x,
++	x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x,
++	x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x,
++	x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x,
++	x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x,
++	x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x,
++	x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x,
++	x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x,
++	x, x, x, x, x, x, x, x, x, x, x, x, x, x);
++    }
++  g();
++  return 1;
++}
++
++/*
++** test2:
++**	...
++**	str	x30, \[sp\]
++**	sub	sp, sp, #1040
++**	str	xzr, \[sp, #?1024\]
++**	cbnz	w0, .*
++**	bl	g
++**	...
++*/
++int test2(int z) {
++  __uint128_t x = 0;
++  int y[0x400];
++  if (z)
++    {
++      asm volatile ("" :::
++		    "x19", "x20", "x21", "x22", "x23", "x24", "x25", "x26");
++      f(0, 0, 0, 0, 0, 0, 0, &y,
++	x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x,
++	x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x,
++	x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x,
++	x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x,
++	x);
++    }
++  g();
++  return 1;
++}
++
++/*
++** test3:
++**	...
++**	str	x30, \[sp\]
++**	sub	sp, sp, #1024
++**	cbnz	w0, .*
++**	bl	g
++**	...
++*/
++int test3(int z) {
++  __uint128_t x = 0;
++  int y[0x400];
++  if (z)
++    {
++      asm volatile ("" :::
++		    "x19", "x20", "x21", "x22", "x23", "x24", "x25", "x26");
++      f(0, 0, 0, 0, 0, 0, 0, &y,
++	x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x,
++	x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x,
++	x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x,
++	x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x);
++    }
++  g();
++  return 1;
++}
+Index: gcc-12-12.2.0/src/gcc/testsuite/gcc.target/aarch64/stack-check-prologue-19.c
+===================================================================
+--- /dev/null
++++ gcc-12-12.2.0/src/gcc/testsuite/gcc.target/aarch64/stack-check-prologue-19.c
+@@ -0,0 +1,100 @@
++/* { dg-options "-O2 -fstack-clash-protection -fomit-frame-pointer --param stack-clash-protection-guard-size=12 -fsanitize=shadow-call-stack -ffixed-x18" } */
++/* { dg-final { check-function-bodies "**" "" } } */
++
++void f(int, ...);
++void g();
++
++/*
++** test1:
++**	...
++**	str	x30, \[sp\]
++**	sub	sp, sp, #4064
++**	str	xzr, \[sp, #?1024\]
++**	cbnz	w0, .*
++**	bl	g
++**	...
++**	str	x26, \[sp, #?4128\]
++**	...
++*/
++int test1(int z) {
++  __uint128_t x = 0;
++  int y[0x400];
++  if (z)
++    {
++      asm volatile ("" :::
++		    "x19", "x20", "x21", "x22", "x23", "x24", "x25", "x26");
++      f(0, 0, 0, 0, 0, 0, 0, &y,
++	x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x,
++	x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x,
++	x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x,
++	x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x,
++	x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x,
++	x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x,
++	x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x,
++	x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x,
++	x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x,
++	x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x,
++	x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x,
++	x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x,
++	x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x,
++	x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x,
++	x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x,
++	x, x, x, x, x, x, x, x, x, x, x, x, x, x);
++    }
++  g();
++  return 1;
++}
++
++/*
++** test2:
++**	...
++**	str	x30, \[sp\]
++**	sub	sp, sp, #1040
++**	str	xzr, \[sp, #?1024\]
++**	cbnz	w0, .*
++**	bl	g
++**	...
++*/
++int test2(int z) {
++  __uint128_t x = 0;
++  int y[0x400];
++  if (z)
++    {
++      asm volatile ("" :::
++		    "x19", "x20", "x21", "x22", "x23", "x24", "x25", "x26");
++      f(0, 0, 0, 0, 0, 0, 0, &y,
++	x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x,
++	x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x,
++	x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x,
++	x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x,
++	x);
++    }
++  g();
++  return 1;
++}
++
++/*
++** test3:
++**	...
++**	str	x30, \[sp\]
++**	sub	sp, sp, #1024
++**	cbnz	w0, .*
++**	bl	g
++**	...
++*/
++int test3(int z) {
++  __uint128_t x = 0;
++  int y[0x400];
++  if (z)
++    {
++      asm volatile ("" :::
++		    "x19", "x20", "x21", "x22", "x23", "x24", "x25", "x26");
++      f(0, 0, 0, 0, 0, 0, 0, &y,
++	x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x,
++	x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x,
++	x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x,
++	x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x);
++    }
++  g();
++  return 1;
++}
+Index: gcc-12-12.2.0/src/gcc/testsuite/gcc.target/aarch64/stack-check-prologue-20.c
+===================================================================
+--- /dev/null
++++ gcc-12-12.2.0/src/gcc/testsuite/gcc.target/aarch64/stack-check-prologue-20.c
+@@ -0,0 +1,3 @@
++/* { dg-options "-O2 -fstack-protector-all -fstack-clash-protection -fomit-frame-pointer --param stack-clash-protection-guard-size=12 -fsanitize=shadow-call-stack -ffixed-x18" } */
++
++#include "stack-check-prologue-19.c"
+Index: gcc-12-12.2.0/src/gcc/testsuite/gcc.target/aarch64/stack-protector-8.c
+===================================================================
+--- /dev/null
++++ gcc-12-12.2.0/src/gcc/testsuite/gcc.target/aarch64/stack-protector-8.c
+@@ -0,0 +1,95 @@
++/* { dg-options " -O -fstack-protector-strong -mstack-protector-guard=sysreg -mstack-protector-guard-reg=tpidr2_el0 -mstack-protector-guard-offset=16" } */
++/* { dg-final { check-function-bodies "**" "" } } */
++
++void g(void *);
++__SVBool_t *h(void *);
++
++/*
++** test1:
++**	sub	sp, sp, #288
++**	stp	x29, x30, \[sp, #?272\]
++**	add	x29, sp, #?272
++**	mrs	(x[0-9]+), tpidr2_el0
++**	ldr	(x[0-9]+), \[\1, #?16\]
++**	str	\2, \[sp, #?264\]
++**	mov	\2, #?0
++**	add	x0, sp, #?8
++**	bl	g
++**	...
++**	mrs	.*
++**	...
++**	bne	.*
++**	...
++**	ldp	x29, x30, \[sp, #?272\]
++**	add	sp, sp, #?288
++**	ret
++**	bl	__stack_chk_fail
++*/
++int test1() {
++  int y[0x40];
++  g(y);
++  return 1;
++}
++
++/*
++** test2:
++**	stp	x29, x30, \[sp, #?-16\]!
++**	mov	x29, sp
++**	sub	sp, sp, #1040
++**	mrs	(x[0-9]+), tpidr2_el0
++**	ldr	(x[0-9]+), \[\1, #?16\]
++**	str	\2, \[sp, #?1032\]
++**	mov	\2, #?0
++**	add	x0, sp, #?8
++**	bl	g
++**	...
++**	mrs	.*
++**	...
++**	bne	.*
++**	...
++**	add	sp, sp, #?1040
++**	ldp	x29, x30, \[sp\], #?16
++**	ret
++**	bl	__stack_chk_fail
++*/
++int test2() {
++  int y[0x100];
++  g(y);
++  return 1;
++}
++
++#pragma GCC target "+sve"
++
++/*
++** test3:
++**	stp	x29, x30, \[sp, #?-16\]!
++**	mov	x29, sp
++**	addvl	sp, sp, #-18
++**	...
++**	str	p4, \[sp\]
++**	...
++**	sub	sp, sp, #272
++**	mrs	(x[0-9]+), tpidr2_el0
++**	ldr	(x[0-9]+), \[\1, #?16\]
++**	str	\2, \[sp, #?264\]
++**	mov	\2, #?0
++**	add	x0, sp, #?8
++**	bl	h
++**	...
++**	mrs	.*
++**	...
++**	bne	.*
++**	...
++**	add	sp, sp, #?272
++**	...
++**	ldr	p4, \[sp\]
++**	...
++**	addvl	sp, sp, #18
++**	ldp	x29, x30, \[sp\], #?16
++**	ret
++**	bl	__stack_chk_fail
++*/
++__SVBool_t test3() {
++  int y[0x40];
++  return *h(y);
++}
+Index: gcc-12-12.2.0/src/gcc/testsuite/gcc.target/aarch64/stack-protector-9.c
+===================================================================
+--- /dev/null
++++ gcc-12-12.2.0/src/gcc/testsuite/gcc.target/aarch64/stack-protector-9.c
+@@ -0,0 +1,33 @@
++/* { dg-options "-O2 -mcpu=neoverse-v1 -fstack-protector-all" } */
++/* { dg-final { check-function-bodies "**" "" } } */
++
++/*
++** main:
++**	...
++**	stp	x29, x30, \[sp, #?-[0-9]+\]!
++**	...
++**	sub	sp, sp, #[0-9]+
++**	...
++**	str	x[0-9]+, \[x29, #?-8\]
++**	...
++*/
++int f(const char *);
++void g(void *);
++int main(int argc, char* argv[])
++{
++  int a;
++  int b;
++  char c[2+f(argv[1])];
++  int d[0x100];
++  char y;
++
++  y=42; a=4; b=10;
++  c[0] = 'h'; c[1] = '\0';
++
++  c[f(argv[2])] = '\0';
++
++  __builtin_printf("%d %d\n%s\n", a, b, c);
++  g(d);
++
++  return 0;
++}
+Index: gcc-12-12.2.0/src/gcc/testsuite/gcc.target/aarch64/sve/pcs/stack_clash_3.c
+===================================================================
+--- gcc-12-12.2.0.orig/src/gcc/testsuite/gcc.target/aarch64/sve/pcs/stack_clash_3.c
++++ gcc-12-12.2.0/src/gcc/testsuite/gcc.target/aarch64/sve/pcs/stack_clash_3.c
+@@ -11,11 +11,10 @@
+ **	mov	x11, sp
+ **	...
+ **	sub	sp, sp, x13
+-**	str	p4, \[sp\]
+ **	cbz	w0, [^\n]*
++**	str	p4, \[sp\]
+ **	...
+ **	ptrue	p0\.b, all
+-**	ldr	p4, \[sp\]
+ **	addvl	sp, sp, #1
+ **	ldr	x24, \[sp\], 32
+ **	ret
+@@ -39,13 +38,12 @@ test_1 (int n)
+ **	mov	x11, sp
+ **	...
+ **	sub	sp, sp, x13
+-**	str	p4, \[sp\]
+ **	cbz	w0, [^\n]*
++**	str	p4, \[sp\]
+ **	str	p5, \[sp, #1, mul vl\]
+ **	str	p6, \[sp, #2, mul vl\]
+ **	...
+ **	ptrue	p0\.b, all
+-**	ldr	p4, \[sp\]
+ **	addvl	sp, sp, #1
+ **	ldr	x24, \[sp\], 32
+ **	ret
+Index: gcc-12-12.2.0/src/gcc/testsuite/gcc.dg/rtl/aarch64/pr111411.c
+===================================================================
+--- /dev/null
++++ gcc-12-12.2.0/src/gcc/testsuite/gcc.dg/rtl/aarch64/pr111411.c
+@@ -0,0 +1,57 @@
++/* { dg-do compile { target aarch64*-*-* } } */
++/* { dg-require-effective-target lp64 } */
++/* { dg-options "-O -fdisable-rtl-postreload -fpeephole2 -fno-schedule-fusion" } */
++
++extern int data[];
++
++void __RTL (startwith ("ira")) foo (void *ptr)
++{
++  (function "foo"
++    (param "ptr"
++      (DECL_RTL (reg/v:DI <0> [ ptr ]))
++      (DECL_RTL_INCOMING (reg/v:DI x0 [ ptr ]))
++    ) ;; param "ptr"
++    (insn-chain
++      (block 2
++	(edge-from entry (flags "FALLTHRU"))
++	(cnote 3 [bb 2] NOTE_INSN_BASIC_BLOCK)
++	(insn 4 (set (reg:DI <0>) (reg:DI x0)))
++	(insn 5 (set (reg:DI <1>)
++		     (plus:DI (reg:DI <0>) (const_int 768))))
++	(insn 6 (set (mem:SI (plus:DI (reg:DI <0>)
++				      (const_int 508)) [1 &data+508 S4 A4])
++		     (const_int 0)))
++	(insn 7 (set (mem:SI (plus:DI (reg:DI <1>)
++				      (const_int -256)) [1 &data+512 S4 A4])
++		     (const_int 0)))
++	(edge-to exit (flags "FALLTHRU"))
++      ) ;; block 2
++    ) ;; insn-chain
++  ) ;; function
++}
++
++void __RTL (startwith ("ira")) bar (void *ptr)
++{
++  (function "bar"
++    (param "ptr"
++      (DECL_RTL (reg/v:DI <0> [ ptr ]))
++      (DECL_RTL_INCOMING (reg/v:DI x0 [ ptr ]))
++    ) ;; param "ptr"
++    (insn-chain
++      (block 2
++	(edge-from entry (flags "FALLTHRU"))
++	(cnote 3 [bb 2] NOTE_INSN_BASIC_BLOCK)
++	(insn 4 (set (reg:DI <0>) (reg:DI x0)))
++	(insn 5 (set (reg:DI <1>)
++		     (plus:DI (reg:DI <0>) (const_int 768))))
++	(insn 6 (set (mem:SI (plus:DI (reg:DI <1>)
++				      (const_int -256)) [1 &data+512 S4 A4])
++		     (const_int 0)))
++	(insn 7 (set (mem:SI (plus:DI (reg:DI <0>)
++				      (const_int 508)) [1 &data+508 S4 A4])
++		     (const_int 0)))
++	(edge-to exit (flags "FALLTHRU"))
++      ) ;; block 2
++    ) ;; insn-chain
++  ) ;; function
++}
diff -Nru gcc-12-12.2.0/debian/rules.patch gcc-12-12.2.0/debian/rules.patch
--- gcc-12-12.2.0/debian/rules.patch	2022-12-26 16:53:20.000000000 +0100
+++ gcc-12-12.2.0/debian/rules.patch	2023-09-19 21:36:59.000000000 +0200
@@ -245,6 +245,8 @@
 debian_patches += libgomp-kfreebsd-testsuite
 debian_patches += go-testsuite
 
+debian_patches += CVE-2023-4039
+
 # don't remove, this is regularly overwritten, see PR sanitizer/63958.
 #debian_patches += libasan-sparc
 

--- End Message ---
--- Begin Message ---
Version: 12.11
This update has been released as part of 12.10. Thank you for your contribution.

--- End Message ---

Reply to: