Bug#1055211: bookworm-pu: package gcc-12/12.2.0-14+deb12u1 (CVE-2023-4039)
Package: release.debian.org
Severity: normal
Tags: bookworm
User: release.debian.org@packages.debian.org
Usertags: pu
X-Debbugs-Cc: debian-gcc@lists.debian.org, jmm@debian.org, doko@debian.org
Control: affects -1 + src:gcc-12
[ Reason ]
A failure in the -fstack-protector feature in GCC-based toolchains that target
AArch64 allows an attacker to exploit an existing buffer overflow in
dynamically-sized local variables without this being detected.
The Security Team explicitly stated that they will not release a DSA for the
bug, see https://security-tracker.debian.org/tracker/CVE-2023-4039
This issue has been fixed in version 12.3.0-9 for sid and trixie. It would now
be good to address the problem in bookworm too.
[ Impact ]
Without this change, arm64 users of gcc-12 in bookworm are not fully protected
by -fstack-protector as described in CVE-2023-4039.
[ Tests ]
In terms of manual testing, I have verified that the stack smashing attack
published at
https://github.com/metaredteam/external-disclosures/security/advisories/GHSA-x7ch-h5rf-w2mf
is detected and stopped when the PoC is compiled with gcc 12.2.0-14+deb12u1,
while it results in a Bus error with 12.2.0-14:
$ gcc-12 --version | head -1
gcc-12 (Debian 12.2.0-14+deb12u1) 12.2.0
$ gcc-12 -fstack-protector-all -O3 -static -Wall -Wextra -pedantic -o poc poc.c
$ echo -n 'DDDDDDDDPPPPPPPPFFFFFFFFAAAAAAAA' | ./poc 8
*** stack smashing detected ***: terminated
Aborted
$ gcc-12 --version | head -1
gcc-12 (Debian 12.2.0-14) 12.2.0
$ gcc-12 -fstack-protector-all -O3 -static -Wall -Wextra -pedantic -o poc poc.c
$ echo -n 'DDDDDDDDPPPPPPPPFFFFFFFFAAAAAAAA' | ./poc 8
Bus error
As an additional smoke test I have also successfully built a few packages, as
well as a kernel.
In terms of automated testing, the upstream test suite is extensive.
Additionally, upstream added the following tests specifically for
CVE-2023-4039:
testsuite/gcc.target/aarch64/stack-check-prologue-17.c
testsuite/gcc.target/aarch64/stack-check-prologue-18.c
testsuite/gcc.target/aarch64/stack-check-prologue-19.c
testsuite/gcc.target/aarch64/stack-check-prologue-20.c
testsuite/gcc.target/aarch64/stack-protector-8.c
testsuite/gcc.target/aarch64/stack-protector-9.c
[ Risks ]
There are obviously potential risks of regressions associated with compiler
changes. However the specific changes proposed here have been part of gcc-12
upstream since September, and by now have been tested quite a bit. One
regression has been found early on and fixed:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111411
All changes are arm64-specific and don't affect any other architecture.
[ Checklist ]
[x] *all* changes are documented in the d/changelog
[x] I reviewed all changes and I approve them
[x] attach debdiff against the package in (old)stable
[x] the issue is verified as fixed in unstable
[ Changes ]
All changes merged upstream back in September 2023, see
https://gcc.gnu.org/git/?p=gcc.git;a=shortlog;h=refs/heads/releases/gcc-12
- aarch64: Avoid a use of callee_offset
12a8889de169f892d2e927584c00d20b8b7e456f
- aarch64: Explicitly handle frames with no saved registers
03d5e89e7f3be53fd7142556e8e0a2774c653dca
- aarch64: Add bytes_below_saved_regs to frame info
49c2eb7616756c323b7f6b18d8616ec945eb1263
- aarch64: Add bytes_below_hard_fp to frame info
34081079ea4de0c98331843f574b5f6f94d7b234
- aarch64: Tweak aarch64_save/restore_callee_saves
187861af7c51db9eddc6f954b589c121b210fc74
- aarch64: Only calculate chain_offset if there is a chain
2b983f9064d808daf909bde1d4a13980934a7e6e
- aarch64: Rename locals_offset to bytes_above_locals
0a0a824808d1dec51004fb5805c1a0ae2a35433f
- aarch64: Rename hard_fp_offset to bytes_above_hard_fp
3fbf0789202b30a67b12e1fb785c7130f098d665
- aarch64: Tweak frame_size comment
aac8b31379ac3bbd14fc6427dce23f56e54e8485
- aarch64: Measure reg_offset from the bottom of the frame
8d5506a8aeb8dd7e8b209a3663b07688478f76b9
- aarch64: Simplify top of frame allocation
b47766614df3b9df878262efb2ad73aaac108363
- aarch64: Minor initial adjustment tweak
08f71b4bb28fb74d20e8d2927a557e8119ce9f4d
- aarch64: Tweak stack clash boundary condition
f22315d5c19e8310e4dc880fd509678fd291fca8
- aarch64: Put LR save probe in first 16 bytes
15e18831bf98fd25af098b970ebf0c9a6200a34b
- aarch64: Simplify probe of final frame allocation
c4f0e121faa36342f1d21919e54a05ad841c4f86
- aarch64: Explicitly record probe registers in frame info
6f0ab0a9f46a17b68349ff6035aa776bf65f0575
- aarch64: Remove below_hard_fp_saved_regs_size
8254e1b9cd500e0c278465a3657543477e9d1250
- aarch64: Make stack smash canary protect saved registers
75c37e031408262263442f5b4cdb83d3777b6422
- aarch64: Fix return register handling in untyped_call
38d0605ac8bc90324170041676fc05e7e595769e
- aarch64: Fix loose ldpstp check [PR111411]
74f99f1adc696f446115f36974a3f94f66294a53
diff -Nru gcc-12-12.2.0/debian/changelog gcc-12-12.2.0/debian/changelog
--- gcc-12-12.2.0/debian/changelog 2023-01-08 10:12:42.000000000 +0100
+++ gcc-12-12.2.0/debian/changelog 2023-10-09 11:45:48.000000000 +0200
@@ -1,3 +1,9 @@
+gcc-12 (12.2.0-14+deb12u1) bookworm; urgency=medium
+
+ * Fix -fstack-protector handling of overflows on AArch64 (CVE-2023-4039).
+
+ -- Emanuele Rocca <ema@debian.org> Mon, 09 Oct 2023 11:45:48 +0200
+
gcc-12 (12.2.0-14) unstable; urgency=medium
* Update to git 20230108 from the gcc-12 branch.
diff -Nru gcc-12-12.2.0/debian/patches/CVE-2023-4039.diff gcc-12-12.2.0/debian/patches/CVE-2023-4039.diff
--- gcc-12-12.2.0/debian/patches/CVE-2023-4039.diff 1970-01-01 01:00:00.000000000 +0100
+++ gcc-12-12.2.0/debian/patches/CVE-2023-4039.diff 2023-09-21 10:08:42.000000000 +0200
@@ -0,0 +1,1665 @@
+DP: Fix CVE-2023-4039.
+DP:
+DP: The patch includes the following upstream commits:
+DP:
+DP: aarch64: Use local frame vars in shrink-wrapping code
+DP: 62fbb215cc817e9f2c1ca80282a64f4ee30806bc
+DP:
+DP: aarch64: Avoid a use of callee_offset
+DP: 12a8889de169f892d2e927584c00d20b8b7e456f
+DP:
+DP: aarch64: Explicitly handle frames with no saved registers
+DP: 03d5e89e7f3be53fd7142556e8e0a2774c653dca
+DP:
+DP: aarch64: Add bytes_below_saved_regs to frame info
+DP: 49c2eb7616756c323b7f6b18d8616ec945eb1263
+DP:
+DP: aarch64: Add bytes_below_hard_fp to frame info
+DP: 34081079ea4de0c98331843f574b5f6f94d7b234
+DP:
+DP: aarch64: Tweak aarch64_save/restore_callee_saves
+DP: 187861af7c51db9eddc6f954b589c121b210fc74
+DP:
+DP: aarch64: Only calculate chain_offset if there is a chain
+DP: 2b983f9064d808daf909bde1d4a13980934a7e6e
+DP:
+DP: aarch64: Rename locals_offset to bytes_above_locals
+DP: 0a0a824808d1dec51004fb5805c1a0ae2a35433f
+DP:
+DP: aarch64: Rename hard_fp_offset to bytes_above_hard_fp
+DP: 3fbf0789202b30a67b12e1fb785c7130f098d665
+DP:
+DP: aarch64: Tweak frame_size comment
+DP: aac8b31379ac3bbd14fc6427dce23f56e54e8485
+DP:
+DP: aarch64: Measure reg_offset from the bottom of the frame
+DP: 8d5506a8aeb8dd7e8b209a3663b07688478f76b9
+DP:
+DP: aarch64: Simplify top of frame allocation
+DP: b47766614df3b9df878262efb2ad73aaac108363
+DP:
+DP: aarch64: Minor initial adjustment tweak
+DP: 08f71b4bb28fb74d20e8d2927a557e8119ce9f4d
+DP:
+DP: aarch64: Tweak stack clash boundary condition
+DP: f22315d5c19e8310e4dc880fd509678fd291fca8
+DP:
+DP: aarch64: Put LR save probe in first 16 bytes
+DP: 15e18831bf98fd25af098b970ebf0c9a6200a34b
+DP:
+DP: aarch64: Simplify probe of final frame allocation
+DP: c4f0e121faa36342f1d21919e54a05ad841c4f86
+DP:
+DP: aarch64: Explicitly record probe registers in frame info
+DP: 6f0ab0a9f46a17b68349ff6035aa776bf65f0575
+DP:
+DP: aarch64: Remove below_hard_fp_saved_regs_size
+DP: 8254e1b9cd500e0c278465a3657543477e9d1250
+DP:
+DP: aarch64: Make stack smash canary protect saved registers
+DP: 75c37e031408262263442f5b4cdb83d3777b6422
+DP:
+DP: aarch64: Fix return register handling in untyped_call
+DP: 38d0605ac8bc90324170041676fc05e7e595769e
+DP:
+DP: aarch64: Fix loose ldpstp check [PR111411]
+DP: 74f99f1adc696f446115f36974a3f94f66294a53
+
+Index: gcc-12-12.2.0/src/gcc/config/aarch64/aarch64.cc
+===================================================================
+--- gcc-12-12.2.0.orig/src/gcc/config/aarch64/aarch64.cc
++++ gcc-12-12.2.0/src/gcc/config/aarch64/aarch64.cc
+@@ -8070,18 +8070,32 @@ aarch64_needs_frame_chain (void)
+ return aarch64_use_frame_pointer;
+ }
+
++/* Return true if the current function should save registers above
++ the locals area, rather than below it. */
++
++static bool
++aarch64_save_regs_above_locals_p ()
++{
++ /* When using stack smash protection, make sure that the canary slot
++ comes between the locals and the saved registers. Otherwise,
++ it would be possible for a carefully sized smash attack to change
++ the saved registers (particularly LR and FP) without reaching the
++ canary. */
++ return crtl->stack_protect_guard;
++}
++
+ /* Mark the registers that need to be saved by the callee and calculate
+ the size of the callee-saved registers area and frame record (both FP
+ and LR may be omitted). */
+ static void
+ aarch64_layout_frame (void)
+ {
+- poly_int64 offset = 0;
+ int regno, last_fp_reg = INVALID_REGNUM;
+ machine_mode vector_save_mode = aarch64_reg_save_mode (V8_REGNUM);
+ poly_int64 vector_save_size = GET_MODE_SIZE (vector_save_mode);
+ bool frame_related_fp_reg_p = false;
+ aarch64_frame &frame = cfun->machine->frame;
++ poly_int64 top_of_locals = -1;
+
+ frame.emit_frame_chain = aarch64_needs_frame_chain ();
+
+@@ -8148,11 +8162,18 @@ aarch64_layout_frame (void)
+ && !crtl->abi->clobbers_full_reg_p (regno))
+ frame.reg_offset[regno] = SLOT_REQUIRED;
+
+- /* With stack-clash, LR must be saved in non-leaf functions. The saving of
+- LR counts as an implicit probe which allows us to maintain the invariant
+- described in the comment at expand_prologue. */
+- gcc_assert (crtl->is_leaf
+- || maybe_ne (frame.reg_offset[R30_REGNUM], SLOT_NOT_REQUIRED));
++ bool regs_at_top_p = aarch64_save_regs_above_locals_p ();
++
++ poly_int64 offset = crtl->outgoing_args_size;
++ gcc_assert (multiple_p (offset, STACK_BOUNDARY / BITS_PER_UNIT));
++ if (regs_at_top_p)
++ {
++ offset += get_frame_size ();
++ offset = aligned_upper_bound (offset, STACK_BOUNDARY / BITS_PER_UNIT);
++ top_of_locals = offset;
++ }
++ frame.bytes_below_saved_regs = offset;
++ frame.sve_save_and_probe = INVALID_REGNUM;
+
+ /* Now assign stack slots for the registers. Start with the predicate
+ registers, since predicate LDR and STR have a relatively small
+@@ -8160,11 +8181,14 @@ aarch64_layout_frame (void)
+ for (regno = P0_REGNUM; regno <= P15_REGNUM; regno++)
+ if (known_eq (frame.reg_offset[regno], SLOT_REQUIRED))
+ {
++ if (frame.sve_save_and_probe == INVALID_REGNUM)
++ frame.sve_save_and_probe = regno;
+ frame.reg_offset[regno] = offset;
+ offset += BYTES_PER_SVE_PRED;
+ }
+
+- if (maybe_ne (offset, 0))
++ poly_int64 saved_prs_size = offset - frame.bytes_below_saved_regs;
++ if (maybe_ne (saved_prs_size, 0))
+ {
+ /* If we have any vector registers to save above the predicate registers,
+ the offset of the vector register save slots need to be a multiple
+@@ -8182,10 +8206,10 @@ aarch64_layout_frame (void)
+ offset = aligned_upper_bound (offset, STACK_BOUNDARY / BITS_PER_UNIT);
+ else
+ {
+- if (known_le (offset, vector_save_size))
+- offset = vector_save_size;
+- else if (known_le (offset, vector_save_size * 2))
+- offset = vector_save_size * 2;
++ if (known_le (saved_prs_size, vector_save_size))
++ offset = frame.bytes_below_saved_regs + vector_save_size;
++ else if (known_le (saved_prs_size, vector_save_size * 2))
++ offset = frame.bytes_below_saved_regs + vector_save_size * 2;
+ else
+ gcc_unreachable ();
+ }
+@@ -8196,34 +8220,53 @@ aarch64_layout_frame (void)
+ for (regno = V0_REGNUM; regno <= V31_REGNUM; regno++)
+ if (known_eq (frame.reg_offset[regno], SLOT_REQUIRED))
+ {
++ if (frame.sve_save_and_probe == INVALID_REGNUM)
++ frame.sve_save_and_probe = regno;
+ frame.reg_offset[regno] = offset;
+ offset += vector_save_size;
+ }
+
+ /* OFFSET is now the offset of the hard frame pointer from the bottom
+ of the callee save area. */
+- bool saves_below_hard_fp_p = maybe_ne (offset, 0);
+- frame.below_hard_fp_saved_regs_size = offset;
++ auto below_hard_fp_saved_regs_size = offset - frame.bytes_below_saved_regs;
++ bool saves_below_hard_fp_p = maybe_ne (below_hard_fp_saved_regs_size, 0);
++ gcc_assert (!saves_below_hard_fp_p
++ || (frame.sve_save_and_probe != INVALID_REGNUM
++ && known_eq (frame.reg_offset[frame.sve_save_and_probe],
++ frame.bytes_below_saved_regs)));
++
++ frame.bytes_below_hard_fp = offset;
++ frame.hard_fp_save_and_probe = INVALID_REGNUM;
++
++ auto allocate_gpr_slot = [&](unsigned int regno)
++ {
++ if (frame.hard_fp_save_and_probe == INVALID_REGNUM)
++ frame.hard_fp_save_and_probe = regno;
++ frame.reg_offset[regno] = offset;
++ if (frame.wb_push_candidate1 == INVALID_REGNUM)
++ frame.wb_push_candidate1 = regno;
++ else if (frame.wb_push_candidate2 == INVALID_REGNUM)
++ frame.wb_push_candidate2 = regno;
++ offset += UNITS_PER_WORD;
++ };
++
+ if (frame.emit_frame_chain)
+ {
+ /* FP and LR are placed in the linkage record. */
+- frame.reg_offset[R29_REGNUM] = offset;
+- frame.wb_push_candidate1 = R29_REGNUM;
+- frame.reg_offset[R30_REGNUM] = offset + UNITS_PER_WORD;
+- frame.wb_push_candidate2 = R30_REGNUM;
+- offset += 2 * UNITS_PER_WORD;
++ allocate_gpr_slot (R29_REGNUM);
++ allocate_gpr_slot (R30_REGNUM);
+ }
++ else if (flag_stack_clash_protection
++ && known_eq (frame.reg_offset[R30_REGNUM], SLOT_REQUIRED))
++ /* Put the LR save slot first, since it makes a good choice of probe
++ for stack clash purposes. The idea is that the link register usually
++ has to be saved before a call anyway, and so we lose little by
++ stopping it from being individually shrink-wrapped. */
++ allocate_gpr_slot (R30_REGNUM);
+
+ for (regno = R0_REGNUM; regno <= R30_REGNUM; regno++)
+ if (known_eq (frame.reg_offset[regno], SLOT_REQUIRED))
+- {
+- frame.reg_offset[regno] = offset;
+- if (frame.wb_push_candidate1 == INVALID_REGNUM)
+- frame.wb_push_candidate1 = regno;
+- else if (frame.wb_push_candidate2 == INVALID_REGNUM)
+- frame.wb_push_candidate2 = regno;
+- offset += UNITS_PER_WORD;
+- }
++ allocate_gpr_slot (regno);
+
+ poly_int64 max_int_offset = offset;
+ offset = aligned_upper_bound (offset, STACK_BOUNDARY / BITS_PER_UNIT);
+@@ -8232,6 +8275,8 @@ aarch64_layout_frame (void)
+ for (regno = V0_REGNUM; regno <= V31_REGNUM; regno++)
+ if (known_eq (frame.reg_offset[regno], SLOT_REQUIRED))
+ {
++ if (frame.hard_fp_save_and_probe == INVALID_REGNUM)
++ frame.hard_fp_save_and_probe = regno;
+ /* If there is an alignment gap between integer and fp callee-saves,
+ allocate the last fp register to it if possible. */
+ if (regno == last_fp_reg
+@@ -8254,30 +8299,36 @@ aarch64_layout_frame (void)
+
+ offset = aligned_upper_bound (offset, STACK_BOUNDARY / BITS_PER_UNIT);
+
+- frame.saved_regs_size = offset;
+-
+- poly_int64 varargs_and_saved_regs_size = offset + frame.saved_varargs_size;
+-
+- poly_int64 above_outgoing_args
+- = aligned_upper_bound (varargs_and_saved_regs_size
+- + get_frame_size (),
+- STACK_BOUNDARY / BITS_PER_UNIT);
+-
+- frame.hard_fp_offset
+- = above_outgoing_args - frame.below_hard_fp_saved_regs_size;
+-
+- /* Both these values are already aligned. */
+- gcc_assert (multiple_p (crtl->outgoing_args_size,
+- STACK_BOUNDARY / BITS_PER_UNIT));
+- frame.frame_size = above_outgoing_args + crtl->outgoing_args_size;
+-
+- frame.locals_offset = frame.saved_varargs_size;
++ auto saved_regs_size = offset - frame.bytes_below_saved_regs;
++ gcc_assert (known_eq (saved_regs_size, below_hard_fp_saved_regs_size)
++ || (frame.hard_fp_save_and_probe != INVALID_REGNUM
++ && known_eq (frame.reg_offset[frame.hard_fp_save_and_probe],
++ frame.bytes_below_hard_fp)));
++
++ /* With stack-clash, a register must be saved in non-leaf functions.
++ The saving of the bottommost register counts as an implicit probe,
++ which allows us to maintain the invariant described in the comment
++ at expand_prologue. */
++ gcc_assert (crtl->is_leaf || maybe_ne (saved_regs_size, 0));
++
++ if (!regs_at_top_p)
++ {
++ offset += get_frame_size ();
++ offset = aligned_upper_bound (offset, STACK_BOUNDARY / BITS_PER_UNIT);
++ top_of_locals = offset;
++ }
++ offset += frame.saved_varargs_size;
++ gcc_assert (multiple_p (offset, STACK_BOUNDARY / BITS_PER_UNIT));
++ frame.frame_size = offset;
++
++ frame.bytes_above_hard_fp = frame.frame_size - frame.bytes_below_hard_fp;
++ gcc_assert (known_ge (top_of_locals, 0));
++ frame.bytes_above_locals = frame.frame_size - top_of_locals;
+
+ frame.initial_adjust = 0;
+ frame.final_adjust = 0;
+ frame.callee_adjust = 0;
+ frame.sve_callee_adjust = 0;
+- frame.callee_offset = 0;
+
+ frame.wb_pop_candidate1 = frame.wb_push_candidate1;
+ frame.wb_pop_candidate2 = frame.wb_push_candidate2;
+@@ -8288,7 +8339,7 @@ aarch64_layout_frame (void)
+ frame.is_scs_enabled
+ = (!crtl->calls_eh_return
+ && sanitize_flags_p (SANITIZE_SHADOW_CALL_STACK)
+- && known_ge (cfun->machine->frame.reg_offset[LR_REGNUM], 0));
++ && known_ge (frame.reg_offset[LR_REGNUM], 0));
+
+ /* When shadow call stack is enabled, the scs_pop in the epilogue will
+ restore x30, and we don't need to pop x30 again in the traditional
+@@ -8308,75 +8359,76 @@ aarch64_layout_frame (void)
+ max_push_offset to 0, because no registers are popped at this time,
+ so callee_adjust cannot be adjusted. */
+ HOST_WIDE_INT max_push_offset = 0;
+- if (frame.wb_pop_candidate2 != INVALID_REGNUM)
+- max_push_offset = 512;
+- else if (frame.wb_pop_candidate1 != INVALID_REGNUM)
+- max_push_offset = 256;
++ if (frame.wb_pop_candidate1 != INVALID_REGNUM)
++ {
++ if (frame.wb_pop_candidate2 != INVALID_REGNUM)
++ max_push_offset = 512;
++ else
++ max_push_offset = 256;
++ }
+
+- HOST_WIDE_INT const_size, const_outgoing_args_size, const_fp_offset;
++ HOST_WIDE_INT const_size, const_below_saved_regs, const_above_fp;
+ HOST_WIDE_INT const_saved_regs_size;
+- if (frame.frame_size.is_constant (&const_size)
+- && const_size < max_push_offset
+- && known_eq (frame.hard_fp_offset, const_size))
++ if (known_eq (saved_regs_size, 0))
++ frame.initial_adjust = frame.frame_size;
++ else if (frame.frame_size.is_constant (&const_size)
++ && const_size < max_push_offset
++ && known_eq (frame.bytes_above_hard_fp, const_size))
+ {
+- /* Simple, small frame with no outgoing arguments:
++ /* Simple, small frame with no data below the saved registers.
+
+ stp reg1, reg2, [sp, -frame_size]!
+ stp reg3, reg4, [sp, 16] */
+ frame.callee_adjust = const_size;
+ }
+- else if (crtl->outgoing_args_size.is_constant (&const_outgoing_args_size)
+- && frame.saved_regs_size.is_constant (&const_saved_regs_size)
+- && const_outgoing_args_size + const_saved_regs_size < 512
+- /* We could handle this case even with outgoing args, provided
+- that the number of args left us with valid offsets for all
+- predicate and vector save slots. It's such a rare case that
+- it hardly seems worth the effort though. */
+- && (!saves_below_hard_fp_p || const_outgoing_args_size == 0)
++ else if (frame.bytes_below_saved_regs.is_constant (&const_below_saved_regs)
++ && saved_regs_size.is_constant (&const_saved_regs_size)
++ && const_below_saved_regs + const_saved_regs_size < 512
++ /* We could handle this case even with data below the saved
++ registers, provided that that data left us with valid offsets
++ for all predicate and vector save slots. It's such a rare
++ case that it hardly seems worth the effort though. */
++ && (!saves_below_hard_fp_p || const_below_saved_regs == 0)
+ && !(cfun->calls_alloca
+- && frame.hard_fp_offset.is_constant (&const_fp_offset)
+- && const_fp_offset < max_push_offset))
++ && frame.bytes_above_hard_fp.is_constant (&const_above_fp)
++ && const_above_fp < max_push_offset))
+ {
+- /* Frame with small outgoing arguments:
++ /* Frame with small area below the saved registers:
+
+ sub sp, sp, frame_size
+- stp reg1, reg2, [sp, outgoing_args_size]
+- stp reg3, reg4, [sp, outgoing_args_size + 16] */
++ stp reg1, reg2, [sp, bytes_below_saved_regs]
++ stp reg3, reg4, [sp, bytes_below_saved_regs + 16] */
+ frame.initial_adjust = frame.frame_size;
+- frame.callee_offset = const_outgoing_args_size;
+ }
+ else if (saves_below_hard_fp_p
+- && known_eq (frame.saved_regs_size,
+- frame.below_hard_fp_saved_regs_size))
++ && known_eq (saved_regs_size, below_hard_fp_saved_regs_size))
+ {
+ /* Frame in which all saves are SVE saves:
+
+- sub sp, sp, hard_fp_offset + below_hard_fp_saved_regs_size
++ sub sp, sp, frame_size - bytes_below_saved_regs
+ save SVE registers relative to SP
+- sub sp, sp, outgoing_args_size */
+- frame.initial_adjust = (frame.hard_fp_offset
+- + frame.below_hard_fp_saved_regs_size);
+- frame.final_adjust = crtl->outgoing_args_size;
++ sub sp, sp, bytes_below_saved_regs */
++ frame.initial_adjust = frame.frame_size - frame.bytes_below_saved_regs;
++ frame.final_adjust = frame.bytes_below_saved_regs;
+ }
+- else if (frame.hard_fp_offset.is_constant (&const_fp_offset)
+- && const_fp_offset < max_push_offset)
++ else if (frame.bytes_above_hard_fp.is_constant (&const_above_fp)
++ && const_above_fp < max_push_offset)
+ {
+- /* Frame with large outgoing arguments or SVE saves, but with
+- a small local area:
++ /* Frame with large area below the saved registers, or with SVE saves,
++ but with a small area above:
+
+ stp reg1, reg2, [sp, -hard_fp_offset]!
+ stp reg3, reg4, [sp, 16]
+ [sub sp, sp, below_hard_fp_saved_regs_size]
+ [save SVE registers relative to SP]
+- sub sp, sp, outgoing_args_size */
+- frame.callee_adjust = const_fp_offset;
+- frame.sve_callee_adjust = frame.below_hard_fp_saved_regs_size;
+- frame.final_adjust = crtl->outgoing_args_size;
++ sub sp, sp, bytes_below_saved_regs */
++ frame.callee_adjust = const_above_fp;
++ frame.sve_callee_adjust = below_hard_fp_saved_regs_size;
++ frame.final_adjust = frame.bytes_below_saved_regs;
+ }
+ else
+ {
+- /* Frame with large local area and outgoing arguments or SVE saves,
+- using frame pointer:
++ /* General case:
+
+ sub sp, sp, hard_fp_offset
+ stp x29, x30, [sp, 0]
+@@ -8384,10 +8436,29 @@ aarch64_layout_frame (void)
+ stp reg3, reg4, [sp, 16]
+ [sub sp, sp, below_hard_fp_saved_regs_size]
+ [save SVE registers relative to SP]
+- sub sp, sp, outgoing_args_size */
+- frame.initial_adjust = frame.hard_fp_offset;
+- frame.sve_callee_adjust = frame.below_hard_fp_saved_regs_size;
+- frame.final_adjust = crtl->outgoing_args_size;
++ sub sp, sp, bytes_below_saved_regs */
++ frame.initial_adjust = frame.bytes_above_hard_fp;
++ frame.sve_callee_adjust = below_hard_fp_saved_regs_size;
++ frame.final_adjust = frame.bytes_below_saved_regs;
++ }
++
++ /* The frame is allocated in pieces, with each non-final piece
++ including a register save at offset 0 that acts as a probe for
++ the following piece. In addition, the save of the bottommost register
++ acts as a probe for callees and allocas. Roll back any probes that
++ aren't needed.
++
++ A probe isn't needed if it is associated with the final allocation
++ (including callees and allocas) that happens before the epilogue is
++ executed. */
++ if (crtl->is_leaf
++ && !cfun->calls_alloca
++ && known_eq (frame.final_adjust, 0))
++ {
++ if (maybe_ne (frame.sve_callee_adjust, 0))
++ frame.sve_save_and_probe = INVALID_REGNUM;
++ else
++ frame.hard_fp_save_and_probe = INVALID_REGNUM;
+ }
+
+ /* Make sure the individual adjustments add up to the full frame size. */
+@@ -8691,15 +8762,17 @@ aarch64_add_cfa_expression (rtx_insn *in
+ }
+
+ /* Emit code to save the callee-saved registers from register number START
+- to LIMIT to the stack at the location starting at offset START_OFFSET,
+- skipping any write-back candidates if SKIP_WB is true. HARD_FP_VALID_P
+- is true if the hard frame pointer has been set up. */
++ to LIMIT to the stack. The stack pointer is currently BYTES_BELOW_SP
++ bytes above the bottom of the static frame. Skip any write-back
++ candidates if SKIP_WB is true. HARD_FP_VALID_P is true if the hard
++ frame pointer has been set up. */
+
+ static void
+-aarch64_save_callee_saves (poly_int64 start_offset,
++aarch64_save_callee_saves (poly_int64 bytes_below_sp,
+ unsigned start, unsigned limit, bool skip_wb,
+ bool hard_fp_valid_p)
+ {
++ aarch64_frame &frame = cfun->machine->frame;
+ rtx_insn *insn;
+ unsigned regno;
+ unsigned regno2;
+@@ -8714,8 +8787,8 @@ aarch64_save_callee_saves (poly_int64 st
+ bool frame_related_p = aarch64_emit_cfi_for_reg_p (regno);
+
+ if (skip_wb
+- && (regno == cfun->machine->frame.wb_push_candidate1
+- || regno == cfun->machine->frame.wb_push_candidate2))
++ && (regno == frame.wb_push_candidate1
++ || regno == frame.wb_push_candidate2))
+ continue;
+
+ if (cfun->machine->reg_is_wrapped_separately[regno])
+@@ -8723,7 +8796,7 @@ aarch64_save_callee_saves (poly_int64 st
+
+ machine_mode mode = aarch64_reg_save_mode (regno);
+ reg = gen_rtx_REG (mode, regno);
+- offset = start_offset + cfun->machine->frame.reg_offset[regno];
++ offset = frame.reg_offset[regno] - bytes_below_sp;
+ rtx base_rtx = stack_pointer_rtx;
+ poly_int64 sp_offset = offset;
+
+@@ -8734,9 +8807,7 @@ aarch64_save_callee_saves (poly_int64 st
+ else if (GP_REGNUM_P (regno)
+ && (!offset.is_constant (&const_offset) || const_offset >= 512))
+ {
+- gcc_assert (known_eq (start_offset, 0));
+- poly_int64 fp_offset
+- = cfun->machine->frame.below_hard_fp_saved_regs_size;
++ poly_int64 fp_offset = frame.bytes_below_hard_fp - bytes_below_sp;
+ if (hard_fp_valid_p)
+ base_rtx = hard_frame_pointer_rtx;
+ else
+@@ -8758,8 +8829,7 @@ aarch64_save_callee_saves (poly_int64 st
+ && (regno2 = aarch64_next_callee_save (regno + 1, limit)) <= limit
+ && !cfun->machine->reg_is_wrapped_separately[regno2]
+ && known_eq (GET_MODE_SIZE (mode),
+- cfun->machine->frame.reg_offset[regno2]
+- - cfun->machine->frame.reg_offset[regno]))
++ frame.reg_offset[regno2] - frame.reg_offset[regno]))
+ {
+ rtx reg2 = gen_rtx_REG (mode, regno2);
+ rtx mem2;
+@@ -8801,14 +8871,16 @@ aarch64_save_callee_saves (poly_int64 st
+ }
+
+ /* Emit code to restore the callee registers from register number START
+- up to and including LIMIT. Restore from the stack offset START_OFFSET,
+- skipping any write-back candidates if SKIP_WB is true. Write the
+- appropriate REG_CFA_RESTORE notes into CFI_OPS. */
++ up to and including LIMIT. The stack pointer is currently BYTES_BELOW_SP
++ bytes above the bottom of the static frame. Skip any write-back
++ candidates if SKIP_WB is true. Write the appropriate REG_CFA_RESTORE
++ notes into CFI_OPS. */
+
+ static void
+-aarch64_restore_callee_saves (poly_int64 start_offset, unsigned start,
++aarch64_restore_callee_saves (poly_int64 bytes_below_sp, unsigned start,
+ unsigned limit, bool skip_wb, rtx *cfi_ops)
+ {
++ aarch64_frame &frame = cfun->machine->frame;
+ unsigned regno;
+ unsigned regno2;
+ poly_int64 offset;
+@@ -8825,13 +8897,13 @@ aarch64_restore_callee_saves (poly_int64
+ rtx reg, mem;
+
+ if (skip_wb
+- && (regno == cfun->machine->frame.wb_pop_candidate1
+- || regno == cfun->machine->frame.wb_pop_candidate2))
++ && (regno == frame.wb_pop_candidate1
++ || regno == frame.wb_pop_candidate2))
+ continue;
+
+ machine_mode mode = aarch64_reg_save_mode (regno);
+ reg = gen_rtx_REG (mode, regno);
+- offset = start_offset + cfun->machine->frame.reg_offset[regno];
++ offset = frame.reg_offset[regno] - bytes_below_sp;
+ rtx base_rtx = stack_pointer_rtx;
+ if (mode == VNx2DImode && BYTES_BIG_ENDIAN)
+ aarch64_adjust_sve_callee_save_base (mode, base_rtx, anchor_reg,
+@@ -8842,8 +8914,7 @@ aarch64_restore_callee_saves (poly_int64
+ && (regno2 = aarch64_next_callee_save (regno + 1, limit)) <= limit
+ && !cfun->machine->reg_is_wrapped_separately[regno2]
+ && known_eq (GET_MODE_SIZE (mode),
+- cfun->machine->frame.reg_offset[regno2]
+- - cfun->machine->frame.reg_offset[regno]))
++ frame.reg_offset[regno2] - frame.reg_offset[regno]))
+ {
+ rtx reg2 = gen_rtx_REG (mode, regno2);
+ rtx mem2;
+@@ -8948,6 +9019,7 @@ offset_12bit_unsigned_scaled_p (machine_
+ static sbitmap
+ aarch64_get_separate_components (void)
+ {
++ aarch64_frame &frame = cfun->machine->frame;
+ sbitmap components = sbitmap_alloc (LAST_SAVED_REGNUM + 1);
+ bitmap_clear (components);
+
+@@ -8964,20 +9036,11 @@ aarch64_get_separate_components (void)
+ if (mode == VNx2DImode && BYTES_BIG_ENDIAN)
+ continue;
+
+- poly_int64 offset = cfun->machine->frame.reg_offset[regno];
+-
+- /* If the register is saved in the first SVE save slot, we use
+- it as a stack probe for -fstack-clash-protection. */
+- if (flag_stack_clash_protection
+- && maybe_ne (cfun->machine->frame.below_hard_fp_saved_regs_size, 0)
+- && known_eq (offset, 0))
+- continue;
++ poly_int64 offset = frame.reg_offset[regno];
+
+ /* Get the offset relative to the register we'll use. */
+ if (frame_pointer_needed)
+- offset -= cfun->machine->frame.below_hard_fp_saved_regs_size;
+- else
+- offset += crtl->outgoing_args_size;
++ offset -= frame.bytes_below_hard_fp;
+
+ /* Check that we can access the stack slot of the register with one
+ direct load with no adjustments needed. */
+@@ -8994,11 +9057,11 @@ aarch64_get_separate_components (void)
+ /* If the spare predicate register used by big-endian SVE code
+ is call-preserved, it must be saved in the main prologue
+ before any saves that use it. */
+- if (cfun->machine->frame.spare_pred_reg != INVALID_REGNUM)
+- bitmap_clear_bit (components, cfun->machine->frame.spare_pred_reg);
++ if (frame.spare_pred_reg != INVALID_REGNUM)
++ bitmap_clear_bit (components, frame.spare_pred_reg);
+
+- unsigned reg1 = cfun->machine->frame.wb_push_candidate1;
+- unsigned reg2 = cfun->machine->frame.wb_push_candidate2;
++ unsigned reg1 = frame.wb_push_candidate1;
++ unsigned reg2 = frame.wb_push_candidate2;
+ /* If registers have been chosen to be stored/restored with
+ writeback don't interfere with them to avoid having to output explicit
+ stack adjustment instructions. */
+@@ -9009,6 +9072,13 @@ aarch64_get_separate_components (void)
+
+ bitmap_clear_bit (components, LR_REGNUM);
+ bitmap_clear_bit (components, SP_REGNUM);
++ if (flag_stack_clash_protection)
++ {
++ if (frame.sve_save_and_probe != INVALID_REGNUM)
++ bitmap_clear_bit (components, frame.sve_save_and_probe);
++ if (frame.hard_fp_save_and_probe != INVALID_REGNUM)
++ bitmap_clear_bit (components, frame.hard_fp_save_and_probe);
++ }
+
+ return components;
+ }
+@@ -9107,6 +9177,7 @@ aarch64_get_next_set_bit (sbitmap bmp, u
+ static void
+ aarch64_process_components (sbitmap components, bool prologue_p)
+ {
++ aarch64_frame &frame = cfun->machine->frame;
+ rtx ptr_reg = gen_rtx_REG (Pmode, frame_pointer_needed
+ ? HARD_FRAME_POINTER_REGNUM
+ : STACK_POINTER_REGNUM);
+@@ -9121,11 +9192,9 @@ aarch64_process_components (sbitmap comp
+ machine_mode mode = aarch64_reg_save_mode (regno);
+
+ rtx reg = gen_rtx_REG (mode, regno);
+- poly_int64 offset = cfun->machine->frame.reg_offset[regno];
++ poly_int64 offset = frame.reg_offset[regno];
+ if (frame_pointer_needed)
+- offset -= cfun->machine->frame.below_hard_fp_saved_regs_size;
+- else
+- offset += crtl->outgoing_args_size;
++ offset -= frame.bytes_below_hard_fp;
+
+ rtx addr = plus_constant (Pmode, ptr_reg, offset);
+ rtx mem = gen_frame_mem (mode, addr);
+@@ -9148,14 +9217,14 @@ aarch64_process_components (sbitmap comp
+ break;
+ }
+
+- poly_int64 offset2 = cfun->machine->frame.reg_offset[regno2];
++ poly_int64 offset2 = frame.reg_offset[regno2];
+ /* The next register is not of the same class or its offset is not
+ mergeable with the current one into a pair. */
+ if (aarch64_sve_mode_p (mode)
+ || !satisfies_constraint_Ump (mem)
+ || GP_REGNUM_P (regno) != GP_REGNUM_P (regno2)
+ || (crtl->abi->id () == ARM_PCS_SIMD && FP_REGNUM_P (regno))
+- || maybe_ne ((offset2 - cfun->machine->frame.reg_offset[regno]),
++ || maybe_ne ((offset2 - frame.reg_offset[regno]),
+ GET_MODE_SIZE (mode)))
+ {
+ insn = emit_insn (set);
+@@ -9177,9 +9246,7 @@ aarch64_process_components (sbitmap comp
+ /* REGNO2 can be saved/restored in a pair with REGNO. */
+ rtx reg2 = gen_rtx_REG (mode, regno2);
+ if (frame_pointer_needed)
+- offset2 -= cfun->machine->frame.below_hard_fp_saved_regs_size;
+- else
+- offset2 += crtl->outgoing_args_size;
++ offset2 -= frame.bytes_below_hard_fp;
+ rtx addr2 = plus_constant (Pmode, ptr_reg, offset2);
+ rtx mem2 = gen_frame_mem (mode, addr2);
+ rtx set2 = prologue_p ? gen_rtx_SET (mem2, reg2)
+@@ -9253,10 +9320,10 @@ aarch64_stack_clash_protection_alloca_pr
+ registers. If POLY_SIZE is not large enough to require a probe this function
+ will only adjust the stack. When allocating the stack space
+ FRAME_RELATED_P is then used to indicate if the allocation is frame related.
+- FINAL_ADJUSTMENT_P indicates whether we are allocating the outgoing
+- arguments. If we are then we ensure that any allocation larger than the ABI
+- defined buffer needs a probe so that the invariant of having a 1KB buffer is
+- maintained.
++ FINAL_ADJUSTMENT_P indicates whether we are allocating the area below
++ the saved registers. If we are then we ensure that any allocation
++ larger than the ABI defined buffer needs a probe so that the
++ invariant of having a 1KB buffer is maintained.
+
+ We emit barriers after each stack adjustment to prevent optimizations from
+ breaking the invariant that we never drop the stack more than a page. This
+@@ -9272,45 +9339,26 @@ aarch64_allocate_and_probe_stack_space (
+ bool frame_related_p,
+ bool final_adjustment_p)
+ {
++ aarch64_frame &frame = cfun->machine->frame;
+ HOST_WIDE_INT guard_size
+ = 1 << param_stack_clash_protection_guard_size;
+ HOST_WIDE_INT guard_used_by_caller = STACK_CLASH_CALLER_GUARD;
++ HOST_WIDE_INT byte_sp_alignment = STACK_BOUNDARY / BITS_PER_UNIT;
++ gcc_assert (multiple_p (poly_size, byte_sp_alignment));
+ HOST_WIDE_INT min_probe_threshold
+ = (final_adjustment_p
+- ? guard_used_by_caller
++ ? guard_used_by_caller + byte_sp_alignment
+ : guard_size - guard_used_by_caller);
+- /* When doing the final adjustment for the outgoing arguments, take into
+- account any unprobed space there is above the current SP. There are
+- two cases:
+-
+- - When saving SVE registers below the hard frame pointer, we force
+- the lowest save to take place in the prologue before doing the final
+- adjustment (i.e. we don't allow the save to be shrink-wrapped).
+- This acts as a probe at SP, so there is no unprobed space.
+-
+- - When there are no SVE register saves, we use the store of the link
+- register as a probe. We can't assume that LR was saved at position 0
+- though, so treat any space below it as unprobed. */
+- if (final_adjustment_p
+- && known_eq (cfun->machine->frame.below_hard_fp_saved_regs_size, 0))
+- {
+- poly_int64 lr_offset = cfun->machine->frame.reg_offset[LR_REGNUM];
+- if (known_ge (lr_offset, 0))
+- min_probe_threshold -= lr_offset.to_constant ();
+- else
+- gcc_assert (!flag_stack_clash_protection || known_eq (poly_size, 0));
+- }
+-
+- poly_int64 frame_size = cfun->machine->frame.frame_size;
++ poly_int64 frame_size = frame.frame_size;
+
+ /* We should always have a positive probe threshold. */
+ gcc_assert (min_probe_threshold > 0);
+
+ if (flag_stack_clash_protection && !final_adjustment_p)
+ {
+- poly_int64 initial_adjust = cfun->machine->frame.initial_adjust;
+- poly_int64 sve_callee_adjust = cfun->machine->frame.sve_callee_adjust;
+- poly_int64 final_adjust = cfun->machine->frame.final_adjust;
++ poly_int64 initial_adjust = frame.initial_adjust;
++ poly_int64 sve_callee_adjust = frame.sve_callee_adjust;
++ poly_int64 final_adjust = frame.final_adjust;
+
+ if (known_eq (frame_size, 0))
+ {
+@@ -9464,7 +9512,7 @@ aarch64_allocate_and_probe_stack_space (
+ /* Handle any residuals. Residuals of at least MIN_PROBE_THRESHOLD have to
+ be probed. This maintains the requirement that each page is probed at
+ least once. For initial probing we probe only if the allocation is
+- more than GUARD_SIZE - buffer, and for the outgoing arguments we probe
++ more than GUARD_SIZE - buffer, and below the saved registers we probe
+ if the amount is larger than buffer. GUARD_SIZE - buffer + buffer ==
+ GUARD_SIZE. This works that for any allocation that is large enough to
+ trigger a probe here, we'll have at least one, and if they're not large
+@@ -9474,16 +9522,12 @@ aarch64_allocate_and_probe_stack_space (
+ are still safe. */
+ if (residual)
+ {
+- HOST_WIDE_INT residual_probe_offset = guard_used_by_caller;
++ gcc_assert (guard_used_by_caller + byte_sp_alignment <= size);
++
+ /* If we're doing final adjustments, and we've done any full page
+ allocations then any residual needs to be probed. */
+ if (final_adjustment_p && rounded_size != 0)
+ min_probe_threshold = 0;
+- /* If doing a small final adjustment, we always probe at offset 0.
+- This is done to avoid issues when LR is not at position 0 or when
+- the final adjustment is smaller than the probing offset. */
+- else if (final_adjustment_p && rounded_size == 0)
+- residual_probe_offset = 0;
+
+ aarch64_sub_sp (temp1, temp2, residual, frame_related_p);
+ if (residual >= min_probe_threshold)
+@@ -9494,8 +9538,8 @@ aarch64_allocate_and_probe_stack_space (
+ HOST_WIDE_INT_PRINT_DEC " bytes, probing will be required."
+ "\n", residual);
+
+- emit_stack_probe (plus_constant (Pmode, stack_pointer_rtx,
+- residual_probe_offset));
++ emit_stack_probe (plus_constant (Pmode, stack_pointer_rtx,
++ guard_used_by_caller));
+ emit_insn (gen_blockage ());
+ }
+ }
+@@ -9533,20 +9577,24 @@ aarch64_epilogue_uses (int regno)
+ | for register varargs |
+ | |
+ +-------------------------------+
+- | local variables | <-- frame_pointer_rtx
++ | local variables (1) | <-- frame_pointer_rtx
+ | |
+ +-------------------------------+
+- | padding | \
+- +-------------------------------+ |
+- | callee-saved registers | | frame.saved_regs_size
+- +-------------------------------+ |
+- | LR' | |
+- +-------------------------------+ |
+- | FP' | |
+- +-------------------------------+ |<- hard_frame_pointer_rtx (aligned)
+- | SVE vector registers | | \
+- +-------------------------------+ | | below_hard_fp_saved_regs_size
+- | SVE predicate registers | / /
++ | padding (1) |
++ +-------------------------------+
++ | callee-saved registers |
++ +-------------------------------+
++ | LR' |
++ +-------------------------------+
++ | FP' |
++ +-------------------------------+ <-- hard_frame_pointer_rtx (aligned)
++ | SVE vector registers |
++ +-------------------------------+
++ | SVE predicate registers |
++ +-------------------------------+
++ | local variables (2) |
++ +-------------------------------+
++ | padding (2) |
+ +-------------------------------+
+ | dynamic allocation |
+ +-------------------------------+
+@@ -9557,6 +9605,9 @@ aarch64_epilogue_uses (int regno)
+ +-------------------------------+
+ | | <-- stack_pointer_rtx (aligned)
+
++ The regions marked (1) and (2) are mutually exclusive. (2) is used
++ when aarch64_save_regs_above_locals_p is true.
++
+ Dynamic stack allocations via alloca() decrease stack_pointer_rtx
+ but leave frame_pointer_rtx and hard_frame_pointer_rtx
+ unchanged.
+@@ -9571,8 +9622,8 @@ aarch64_epilogue_uses (int regno)
+ When probing is needed, we emit a probe at the start of the prologue
+ and every PARAM_STACK_CLASH_PROTECTION_GUARD_SIZE bytes thereafter.
+
+- We have to track how much space has been allocated and the only stores
+- to the stack we track as implicit probes are the FP/LR stores.
++ We can also use register saves as probes. These are stored in
++ sve_save_and_probe and hard_fp_save_and_probe.
+
+ For outgoing arguments we probe if the size is larger than 1KB, such that
+ the ABI specified buffer is maintained for the next callee.
+@@ -9599,17 +9650,15 @@ aarch64_epilogue_uses (int regno)
+ void
+ aarch64_expand_prologue (void)
+ {
+- poly_int64 frame_size = cfun->machine->frame.frame_size;
+- poly_int64 initial_adjust = cfun->machine->frame.initial_adjust;
+- HOST_WIDE_INT callee_adjust = cfun->machine->frame.callee_adjust;
+- poly_int64 final_adjust = cfun->machine->frame.final_adjust;
+- poly_int64 callee_offset = cfun->machine->frame.callee_offset;
+- poly_int64 sve_callee_adjust = cfun->machine->frame.sve_callee_adjust;
+- poly_int64 below_hard_fp_saved_regs_size
+- = cfun->machine->frame.below_hard_fp_saved_regs_size;
+- unsigned reg1 = cfun->machine->frame.wb_push_candidate1;
+- unsigned reg2 = cfun->machine->frame.wb_push_candidate2;
+- bool emit_frame_chain = cfun->machine->frame.emit_frame_chain;
++ aarch64_frame &frame = cfun->machine->frame;
++ poly_int64 frame_size = frame.frame_size;
++ poly_int64 initial_adjust = frame.initial_adjust;
++ HOST_WIDE_INT callee_adjust = frame.callee_adjust;
++ poly_int64 final_adjust = frame.final_adjust;
++ poly_int64 sve_callee_adjust = frame.sve_callee_adjust;
++ unsigned reg1 = frame.wb_push_candidate1;
++ unsigned reg2 = frame.wb_push_candidate2;
++ bool emit_frame_chain = frame.emit_frame_chain;
+ rtx_insn *insn;
+
+ if (flag_stack_clash_protection && known_eq (callee_adjust, 0))
+@@ -9640,7 +9689,7 @@ aarch64_expand_prologue (void)
+ }
+
+ /* Push return address to shadow call stack. */
+- if (cfun->machine->frame.is_scs_enabled)
++ if (frame.is_scs_enabled)
+ emit_insn (gen_scs_push ());
+
+ if (flag_stack_usage_info)
+@@ -9677,21 +9726,21 @@ aarch64_expand_prologue (void)
+ if (callee_adjust != 0)
+ aarch64_push_regs (reg1, reg2, callee_adjust);
+
+- /* The offset of the frame chain record (if any) from the current SP. */
+- poly_int64 chain_offset = (initial_adjust + callee_adjust
+- - cfun->machine->frame.hard_fp_offset);
+- gcc_assert (known_ge (chain_offset, 0));
+-
+- /* The offset of the bottom of the save area from the current SP. */
+- poly_int64 saved_regs_offset = chain_offset - below_hard_fp_saved_regs_size;
++ /* The offset of the current SP from the bottom of the static frame. */
++ poly_int64 bytes_below_sp = frame_size - initial_adjust - callee_adjust;
+
+ if (emit_frame_chain)
+ {
++ /* The offset of the frame chain record (if any) from the current SP. */
++ poly_int64 chain_offset = (initial_adjust + callee_adjust
++ - frame.bytes_above_hard_fp);
++ gcc_assert (known_ge (chain_offset, 0));
++
+ if (callee_adjust == 0)
+ {
+ reg1 = R29_REGNUM;
+ reg2 = R30_REGNUM;
+- aarch64_save_callee_saves (saved_regs_offset, reg1, reg2,
++ aarch64_save_callee_saves (bytes_below_sp, reg1, reg2,
+ false, false);
+ }
+ else
+@@ -9716,8 +9765,7 @@ aarch64_expand_prologue (void)
+ implicit. */
+ if (!find_reg_note (insn, REG_CFA_ADJUST_CFA, NULL_RTX))
+ {
+- rtx src = plus_constant (Pmode, stack_pointer_rtx,
+- callee_offset);
++ rtx src = plus_constant (Pmode, stack_pointer_rtx, chain_offset);
+ add_reg_note (insn, REG_CFA_ADJUST_CFA,
+ gen_rtx_SET (hard_frame_pointer_rtx, src));
+ }
+@@ -9732,7 +9780,7 @@ aarch64_expand_prologue (void)
+ emit_insn (gen_stack_tie (stack_pointer_rtx, hard_frame_pointer_rtx));
+ }
+
+- aarch64_save_callee_saves (saved_regs_offset, R0_REGNUM, R30_REGNUM,
++ aarch64_save_callee_saves (bytes_below_sp, R0_REGNUM, R30_REGNUM,
+ callee_adjust != 0 || emit_frame_chain,
+ emit_frame_chain);
+ if (maybe_ne (sve_callee_adjust, 0))
+@@ -9742,18 +9790,21 @@ aarch64_expand_prologue (void)
+ aarch64_allocate_and_probe_stack_space (tmp1_rtx, tmp0_rtx,
+ sve_callee_adjust,
+ !frame_pointer_needed, false);
+- saved_regs_offset += sve_callee_adjust;
++ bytes_below_sp -= sve_callee_adjust;
+ }
+- aarch64_save_callee_saves (saved_regs_offset, P0_REGNUM, P15_REGNUM,
++ aarch64_save_callee_saves (bytes_below_sp, P0_REGNUM, P15_REGNUM,
+ false, emit_frame_chain);
+- aarch64_save_callee_saves (saved_regs_offset, V0_REGNUM, V31_REGNUM,
++ aarch64_save_callee_saves (bytes_below_sp, V0_REGNUM, V31_REGNUM,
+ callee_adjust != 0 || emit_frame_chain,
+ emit_frame_chain);
+
+ /* We may need to probe the final adjustment if it is larger than the guard
+ that is assumed by the called. */
++ gcc_assert (known_eq (bytes_below_sp, final_adjust));
+ aarch64_allocate_and_probe_stack_space (tmp1_rtx, tmp0_rtx, final_adjust,
+ !frame_pointer_needed, true);
++ if (emit_frame_chain && maybe_ne (final_adjust, 0))
++ emit_insn (gen_stack_tie (stack_pointer_rtx, hard_frame_pointer_rtx));
+ }
+
+ /* Return TRUE if we can use a simple_return insn.
+@@ -9782,16 +9833,15 @@ aarch64_use_return_insn_p (void)
+ void
+ aarch64_expand_epilogue (bool for_sibcall)
+ {
+- poly_int64 initial_adjust = cfun->machine->frame.initial_adjust;
+- HOST_WIDE_INT callee_adjust = cfun->machine->frame.callee_adjust;
+- poly_int64 final_adjust = cfun->machine->frame.final_adjust;
+- poly_int64 callee_offset = cfun->machine->frame.callee_offset;
+- poly_int64 sve_callee_adjust = cfun->machine->frame.sve_callee_adjust;
+- poly_int64 below_hard_fp_saved_regs_size
+- = cfun->machine->frame.below_hard_fp_saved_regs_size;
+- unsigned reg1 = cfun->machine->frame.wb_pop_candidate1;
+- unsigned reg2 = cfun->machine->frame.wb_pop_candidate2;
+- unsigned int last_gpr = (cfun->machine->frame.is_scs_enabled
++ aarch64_frame &frame = cfun->machine->frame;
++ poly_int64 initial_adjust = frame.initial_adjust;
++ HOST_WIDE_INT callee_adjust = frame.callee_adjust;
++ poly_int64 final_adjust = frame.final_adjust;
++ poly_int64 sve_callee_adjust = frame.sve_callee_adjust;
++ poly_int64 bytes_below_hard_fp = frame.bytes_below_hard_fp;
++ unsigned reg1 = frame.wb_pop_candidate1;
++ unsigned reg2 = frame.wb_pop_candidate2;
++ unsigned int last_gpr = (frame.is_scs_enabled
+ ? R29_REGNUM : R30_REGNUM);
+ rtx cfi_ops = NULL;
+ rtx_insn *insn;
+@@ -9825,7 +9875,7 @@ aarch64_expand_epilogue (bool for_sibcal
+ /* We need to add memory barrier to prevent read from deallocated stack. */
+ bool need_barrier_p
+ = maybe_ne (get_frame_size ()
+- + cfun->machine->frame.saved_varargs_size, 0);
++ + frame.saved_varargs_size, 0);
+
+ /* Emit a barrier to prevent loads from a deallocated stack. */
+ if (maybe_gt (final_adjust, crtl->outgoing_args_size)
+@@ -9846,7 +9896,7 @@ aarch64_expand_epilogue (bool for_sibcal
+ is restored on the instruction doing the writeback. */
+ aarch64_add_offset (Pmode, stack_pointer_rtx,
+ hard_frame_pointer_rtx,
+- -callee_offset - below_hard_fp_saved_regs_size,
++ -bytes_below_hard_fp + final_adjust,
+ tmp1_rtx, tmp0_rtx, callee_adjust == 0);
+ else
+ /* The case where we need to re-use the register here is very rare, so
+@@ -9856,9 +9906,9 @@ aarch64_expand_epilogue (bool for_sibcal
+
+ /* Restore the vector registers before the predicate registers,
+ so that we can use P4 as a temporary for big-endian SVE frames. */
+- aarch64_restore_callee_saves (callee_offset, V0_REGNUM, V31_REGNUM,
++ aarch64_restore_callee_saves (final_adjust, V0_REGNUM, V31_REGNUM,
+ callee_adjust != 0, &cfi_ops);
+- aarch64_restore_callee_saves (callee_offset, P0_REGNUM, P15_REGNUM,
++ aarch64_restore_callee_saves (final_adjust, P0_REGNUM, P15_REGNUM,
+ false, &cfi_ops);
+ if (maybe_ne (sve_callee_adjust, 0))
+ aarch64_add_sp (NULL_RTX, NULL_RTX, sve_callee_adjust, true);
+@@ -9866,7 +9916,7 @@ aarch64_expand_epilogue (bool for_sibcal
+ /* When shadow call stack is enabled, the scs_pop in the epilogue will
+ restore x30, we don't need to restore x30 again in the traditional
+ way. */
+- aarch64_restore_callee_saves (callee_offset - sve_callee_adjust,
++ aarch64_restore_callee_saves (final_adjust + sve_callee_adjust,
+ R0_REGNUM, last_gpr,
+ callee_adjust != 0, &cfi_ops);
+
+@@ -9906,7 +9956,7 @@ aarch64_expand_epilogue (bool for_sibcal
+ }
+
+ /* Pop return address from shadow call stack. */
+- if (cfun->machine->frame.is_scs_enabled)
++ if (frame.is_scs_enabled)
+ {
+ machine_mode mode = aarch64_reg_save_mode (R30_REGNUM);
+ rtx reg = gen_rtx_REG (mode, R30_REGNUM);
+@@ -12501,24 +12551,24 @@ aarch64_can_eliminate (const int from AT
+ poly_int64
+ aarch64_initial_elimination_offset (unsigned from, unsigned to)
+ {
++ aarch64_frame &frame = cfun->machine->frame;
++
+ if (to == HARD_FRAME_POINTER_REGNUM)
+ {
+ if (from == ARG_POINTER_REGNUM)
+- return cfun->machine->frame.hard_fp_offset;
++ return frame.bytes_above_hard_fp;
+
+ if (from == FRAME_POINTER_REGNUM)
+- return cfun->machine->frame.hard_fp_offset
+- - cfun->machine->frame.locals_offset;
++ return frame.bytes_above_hard_fp - frame.bytes_above_locals;
+ }
+
+ if (to == STACK_POINTER_REGNUM)
+ {
+ if (from == FRAME_POINTER_REGNUM)
+- return cfun->machine->frame.frame_size
+- - cfun->machine->frame.locals_offset;
++ return frame.frame_size - frame.bytes_above_locals;
+ }
+
+- return cfun->machine->frame.frame_size;
++ return frame.frame_size;
+ }
+
+
+@@ -25795,11 +25845,9 @@ aarch64_operands_ok_for_ldpstp (rtx *ope
+ gcc_assert (known_eq (GET_MODE_SIZE (GET_MODE (mem_1)),
+ GET_MODE_SIZE (GET_MODE (mem_2))));
+
+- /* One of the memory accesses must be a mempair operand.
+- If it is not the first one, they need to be swapped by the
+- peephole. */
+- if (!aarch64_mem_pair_operand (mem_1, GET_MODE (mem_1))
+- && !aarch64_mem_pair_operand (mem_2, GET_MODE (mem_2)))
++ /* The lower memory access must be a mem-pair operand. */
++ rtx lower_mem = reversed ? mem_2 : mem_1;
++ if (!aarch64_mem_pair_operand (lower_mem, GET_MODE (lower_mem)))
+ return false;
+
+ if (REG_P (reg_1) && FP_REGNUM_P (REGNO (reg_1)))
+Index: gcc-12-12.2.0/src/gcc/config/aarch64/aarch64.h
+===================================================================
+--- gcc-12-12.2.0.orig/src/gcc/config/aarch64/aarch64.h
++++ gcc-12-12.2.0/src/gcc/config/aarch64/aarch64.h
+@@ -862,6 +862,9 @@ extern enum aarch64_processor aarch64_tu
+ #ifdef HAVE_POLY_INT_H
+ struct GTY (()) aarch64_frame
+ {
++ /* The offset from the bottom of the static frame (the bottom of the
++ outgoing arguments) of each register save slot, or -2 if no save is
++ needed. */
+ poly_int64 reg_offset[LAST_SAVED_REGNUM + 1];
+
+ /* The number of extra stack bytes taken up by register varargs.
+@@ -870,25 +873,28 @@ struct GTY (()) aarch64_frame
+ STACK_BOUNDARY. */
+ HOST_WIDE_INT saved_varargs_size;
+
+- /* The size of the callee-save registers with a slot in REG_OFFSET. */
+- poly_int64 saved_regs_size;
++ /* The number of bytes between the bottom of the static frame (the bottom
++ of the outgoing arguments) and the bottom of the register save area.
++ This value is always a multiple of STACK_BOUNDARY. */
++ poly_int64 bytes_below_saved_regs;
++
++ /* The number of bytes between the bottom of the static frame (the bottom
++ of the outgoing arguments) and the hard frame pointer. This value is
++ always a multiple of STACK_BOUNDARY. */
++ poly_int64 bytes_below_hard_fp;
+
+- /* The size of the callee-save registers with a slot in REG_OFFSET that
+- are saved below the hard frame pointer. */
+- poly_int64 below_hard_fp_saved_regs_size;
+-
+- /* Offset from the base of the frame (incomming SP) to the
+- top of the locals area. This value is always a multiple of
++ /* The number of bytes between the top of the locals area and the top
++ of the frame (the incomming SP). This value is always a multiple of
+ STACK_BOUNDARY. */
+- poly_int64 locals_offset;
++ poly_int64 bytes_above_locals;
+
+- /* Offset from the base of the frame (incomming SP) to the
+- hard_frame_pointer. This value is always a multiple of
++ /* The number of bytes between the hard_frame_pointer and the top of
++ the frame (the incomming SP). This value is always a multiple of
+ STACK_BOUNDARY. */
+- poly_int64 hard_fp_offset;
++ poly_int64 bytes_above_hard_fp;
+
+- /* The size of the frame. This value is the offset from base of the
+- frame (incomming SP) to the stack_pointer. This value is always
++ /* The size of the frame, i.e. the number of bytes between the bottom
++ of the outgoing arguments and the incoming SP. This value is always
+ a multiple of STACK_BOUNDARY. */
+ poly_int64 frame_size;
+
+@@ -899,10 +905,6 @@ struct GTY (()) aarch64_frame
+ It is zero when no push is used. */
+ HOST_WIDE_INT callee_adjust;
+
+- /* The offset from SP to the callee-save registers after initial_adjust.
+- It may be non-zero if no push is used (ie. callee_adjust == 0). */
+- poly_int64 callee_offset;
+-
+ /* The size of the stack adjustment before saving or after restoring
+ SVE registers. */
+ poly_int64 sve_callee_adjust;
+@@ -950,6 +952,14 @@ struct GTY (()) aarch64_frame
+ This is the register they should use. */
+ unsigned spare_pred_reg;
+
++ /* An SVE register that is saved below the hard frame pointer and that acts
++ as a probe for later allocations, or INVALID_REGNUM if none. */
++ unsigned sve_save_and_probe;
++
++ /* A register that is saved at the hard frame pointer and that acts
++ as a probe for later allocations, or INVALID_REGNUM if none. */
++ unsigned hard_fp_save_and_probe;
++
+ bool laid_out;
+
+ /* True if shadow call stack should be enabled for the current function. */
+Index: gcc-12-12.2.0/src/gcc/testsuite/gcc.target/aarch64/stack-check-prologue-17.c
+===================================================================
+--- /dev/null
++++ gcc-12-12.2.0/src/gcc/testsuite/gcc.target/aarch64/stack-check-prologue-17.c
+@@ -0,0 +1,55 @@
++/* { dg-options "-O2 -fstack-clash-protection -fomit-frame-pointer --param stack-clash-protection-guard-size=12" } */
++/* { dg-final { check-function-bodies "**" "" } } */
++
++void f(int, ...);
++void g();
++
++/*
++** test1:
++** ...
++** str x30, \[sp\]
++** sub sp, sp, #1024
++** cbnz w0, .*
++** bl g
++** ...
++*/
++int test1(int z) {
++ __uint128_t x = 0;
++ int y[0x400];
++ if (z)
++ {
++ f(0, 0, 0, 0, 0, 0, 0, &y,
++ x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x,
++ x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x,
++ x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x,
++ x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x);
++ }
++ g();
++ return 1;
++}
++
++/*
++** test2:
++** ...
++** str x30, \[sp\]
++** sub sp, sp, #1040
++** str xzr, \[sp, #?1024\]
++** cbnz w0, .*
++** bl g
++** ...
++*/
++int test2(int z) {
++ __uint128_t x = 0;
++ int y[0x400];
++ if (z)
++ {
++ f(0, 0, 0, 0, 0, 0, 0, &y,
++ x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x,
++ x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x,
++ x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x,
++ x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x,
++ x);
++ }
++ g();
++ return 1;
++}
+Index: gcc-12-12.2.0/src/gcc/testsuite/gcc.target/aarch64/stack-check-prologue-18.c
+===================================================================
+--- /dev/null
++++ gcc-12-12.2.0/src/gcc/testsuite/gcc.target/aarch64/stack-check-prologue-18.c
+@@ -0,0 +1,100 @@
++/* { dg-options "-O2 -fstack-clash-protection -fomit-frame-pointer --param stack-clash-protection-guard-size=12" } */
++/* { dg-final { check-function-bodies "**" "" } } */
++
++void f(int, ...);
++void g();
++
++/*
++** test1:
++** ...
++** str x30, \[sp\]
++** sub sp, sp, #4064
++** str xzr, \[sp, #?1024\]
++** cbnz w0, .*
++** bl g
++** ...
++** str x26, \[sp, #?4128\]
++** ...
++*/
++int test1(int z) {
++ __uint128_t x = 0;
++ int y[0x400];
++ if (z)
++ {
++ asm volatile ("" :::
++ "x19", "x20", "x21", "x22", "x23", "x24", "x25", "x26");
++ f(0, 0, 0, 0, 0, 0, 0, &y,
++ x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x,
++ x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x,
++ x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x,
++ x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x,
++ x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x,
++ x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x,
++ x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x,
++ x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x,
++ x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x,
++ x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x,
++ x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x,
++ x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x,
++ x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x,
++ x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x,
++ x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x,
++ x, x, x, x, x, x, x, x, x, x, x, x, x, x);
++ }
++ g();
++ return 1;
++}
++
++/*
++** test2:
++** ...
++** str x30, \[sp\]
++** sub sp, sp, #1040
++** str xzr, \[sp, #?1024\]
++** cbnz w0, .*
++** bl g
++** ...
++*/
++int test2(int z) {
++ __uint128_t x = 0;
++ int y[0x400];
++ if (z)
++ {
++ asm volatile ("" :::
++ "x19", "x20", "x21", "x22", "x23", "x24", "x25", "x26");
++ f(0, 0, 0, 0, 0, 0, 0, &y,
++ x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x,
++ x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x,
++ x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x,
++ x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x,
++ x);
++ }
++ g();
++ return 1;
++}
++
++/*
++** test3:
++** ...
++** str x30, \[sp\]
++** sub sp, sp, #1024
++** cbnz w0, .*
++** bl g
++** ...
++*/
++int test3(int z) {
++ __uint128_t x = 0;
++ int y[0x400];
++ if (z)
++ {
++ asm volatile ("" :::
++ "x19", "x20", "x21", "x22", "x23", "x24", "x25", "x26");
++ f(0, 0, 0, 0, 0, 0, 0, &y,
++ x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x,
++ x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x,
++ x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x,
++ x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x);
++ }
++ g();
++ return 1;
++}
+Index: gcc-12-12.2.0/src/gcc/testsuite/gcc.target/aarch64/stack-check-prologue-19.c
+===================================================================
+--- /dev/null
++++ gcc-12-12.2.0/src/gcc/testsuite/gcc.target/aarch64/stack-check-prologue-19.c
+@@ -0,0 +1,100 @@
++/* { dg-options "-O2 -fstack-clash-protection -fomit-frame-pointer --param stack-clash-protection-guard-size=12 -fsanitize=shadow-call-stack -ffixed-x18" } */
++/* { dg-final { check-function-bodies "**" "" } } */
++
++void f(int, ...);
++void g();
++
++/*
++** test1:
++** ...
++** str x30, \[sp\]
++** sub sp, sp, #4064
++** str xzr, \[sp, #?1024\]
++** cbnz w0, .*
++** bl g
++** ...
++** str x26, \[sp, #?4128\]
++** ...
++*/
++int test1(int z) {
++ __uint128_t x = 0;
++ int y[0x400];
++ if (z)
++ {
++ asm volatile ("" :::
++ "x19", "x20", "x21", "x22", "x23", "x24", "x25", "x26");
++ f(0, 0, 0, 0, 0, 0, 0, &y,
++ x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x,
++ x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x,
++ x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x,
++ x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x,
++ x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x,
++ x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x,
++ x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x,
++ x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x,
++ x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x,
++ x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x,
++ x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x,
++ x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x,
++ x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x,
++ x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x,
++ x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x,
++ x, x, x, x, x, x, x, x, x, x, x, x, x, x);
++ }
++ g();
++ return 1;
++}
++
++/*
++** test2:
++** ...
++** str x30, \[sp\]
++** sub sp, sp, #1040
++** str xzr, \[sp, #?1024\]
++** cbnz w0, .*
++** bl g
++** ...
++*/
++int test2(int z) {
++ __uint128_t x = 0;
++ int y[0x400];
++ if (z)
++ {
++ asm volatile ("" :::
++ "x19", "x20", "x21", "x22", "x23", "x24", "x25", "x26");
++ f(0, 0, 0, 0, 0, 0, 0, &y,
++ x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x,
++ x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x,
++ x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x,
++ x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x,
++ x);
++ }
++ g();
++ return 1;
++}
++
++/*
++** test3:
++** ...
++** str x30, \[sp\]
++** sub sp, sp, #1024
++** cbnz w0, .*
++** bl g
++** ...
++*/
++int test3(int z) {
++ __uint128_t x = 0;
++ int y[0x400];
++ if (z)
++ {
++ asm volatile ("" :::
++ "x19", "x20", "x21", "x22", "x23", "x24", "x25", "x26");
++ f(0, 0, 0, 0, 0, 0, 0, &y,
++ x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x,
++ x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x,
++ x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x,
++ x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x);
++ }
++ g();
++ return 1;
++}
+Index: gcc-12-12.2.0/src/gcc/testsuite/gcc.target/aarch64/stack-check-prologue-20.c
+===================================================================
+--- /dev/null
++++ gcc-12-12.2.0/src/gcc/testsuite/gcc.target/aarch64/stack-check-prologue-20.c
+@@ -0,0 +1,3 @@
++/* { dg-options "-O2 -fstack-protector-all -fstack-clash-protection -fomit-frame-pointer --param stack-clash-protection-guard-size=12 -fsanitize=shadow-call-stack -ffixed-x18" } */
++
++#include "stack-check-prologue-19.c"
+Index: gcc-12-12.2.0/src/gcc/testsuite/gcc.target/aarch64/stack-protector-8.c
+===================================================================
+--- /dev/null
++++ gcc-12-12.2.0/src/gcc/testsuite/gcc.target/aarch64/stack-protector-8.c
+@@ -0,0 +1,95 @@
++/* { dg-options " -O -fstack-protector-strong -mstack-protector-guard=sysreg -mstack-protector-guard-reg=tpidr2_el0 -mstack-protector-guard-offset=16" } */
++/* { dg-final { check-function-bodies "**" "" } } */
++
++void g(void *);
++__SVBool_t *h(void *);
++
++/*
++** test1:
++** sub sp, sp, #288
++** stp x29, x30, \[sp, #?272\]
++** add x29, sp, #?272
++** mrs (x[0-9]+), tpidr2_el0
++** ldr (x[0-9]+), \[\1, #?16\]
++** str \2, \[sp, #?264\]
++** mov \2, #?0
++** add x0, sp, #?8
++** bl g
++** ...
++** mrs .*
++** ...
++** bne .*
++** ...
++** ldp x29, x30, \[sp, #?272\]
++** add sp, sp, #?288
++** ret
++** bl __stack_chk_fail
++*/
++int test1() {
++ int y[0x40];
++ g(y);
++ return 1;
++}
++
++/*
++** test2:
++** stp x29, x30, \[sp, #?-16\]!
++** mov x29, sp
++** sub sp, sp, #1040
++** mrs (x[0-9]+), tpidr2_el0
++** ldr (x[0-9]+), \[\1, #?16\]
++** str \2, \[sp, #?1032\]
++** mov \2, #?0
++** add x0, sp, #?8
++** bl g
++** ...
++** mrs .*
++** ...
++** bne .*
++** ...
++** add sp, sp, #?1040
++** ldp x29, x30, \[sp\], #?16
++** ret
++** bl __stack_chk_fail
++*/
++int test2() {
++ int y[0x100];
++ g(y);
++ return 1;
++}
++
++#pragma GCC target "+sve"
++
++/*
++** test3:
++** stp x29, x30, \[sp, #?-16\]!
++** mov x29, sp
++** addvl sp, sp, #-18
++** ...
++** str p4, \[sp\]
++** ...
++** sub sp, sp, #272
++** mrs (x[0-9]+), tpidr2_el0
++** ldr (x[0-9]+), \[\1, #?16\]
++** str \2, \[sp, #?264\]
++** mov \2, #?0
++** add x0, sp, #?8
++** bl h
++** ...
++** mrs .*
++** ...
++** bne .*
++** ...
++** add sp, sp, #?272
++** ...
++** ldr p4, \[sp\]
++** ...
++** addvl sp, sp, #18
++** ldp x29, x30, \[sp\], #?16
++** ret
++** bl __stack_chk_fail
++*/
++__SVBool_t test3() {
++ int y[0x40];
++ return *h(y);
++}
+Index: gcc-12-12.2.0/src/gcc/testsuite/gcc.target/aarch64/stack-protector-9.c
+===================================================================
+--- /dev/null
++++ gcc-12-12.2.0/src/gcc/testsuite/gcc.target/aarch64/stack-protector-9.c
+@@ -0,0 +1,33 @@
++/* { dg-options "-O2 -mcpu=neoverse-v1 -fstack-protector-all" } */
++/* { dg-final { check-function-bodies "**" "" } } */
++
++/*
++** main:
++** ...
++** stp x29, x30, \[sp, #?-[0-9]+\]!
++** ...
++** sub sp, sp, #[0-9]+
++** ...
++** str x[0-9]+, \[x29, #?-8\]
++** ...
++*/
++int f(const char *);
++void g(void *);
++int main(int argc, char* argv[])
++{
++ int a;
++ int b;
++ char c[2+f(argv[1])];
++ int d[0x100];
++ char y;
++
++ y=42; a=4; b=10;
++ c[0] = 'h'; c[1] = '\0';
++
++ c[f(argv[2])] = '\0';
++
++ __builtin_printf("%d %d\n%s\n", a, b, c);
++ g(d);
++
++ return 0;
++}
+Index: gcc-12-12.2.0/src/gcc/testsuite/gcc.target/aarch64/sve/pcs/stack_clash_3.c
+===================================================================
+--- gcc-12-12.2.0.orig/src/gcc/testsuite/gcc.target/aarch64/sve/pcs/stack_clash_3.c
++++ gcc-12-12.2.0/src/gcc/testsuite/gcc.target/aarch64/sve/pcs/stack_clash_3.c
+@@ -11,11 +11,10 @@
+ ** mov x11, sp
+ ** ...
+ ** sub sp, sp, x13
+-** str p4, \[sp\]
+ ** cbz w0, [^\n]*
++** str p4, \[sp\]
+ ** ...
+ ** ptrue p0\.b, all
+-** ldr p4, \[sp\]
+ ** addvl sp, sp, #1
+ ** ldr x24, \[sp\], 32
+ ** ret
+@@ -39,13 +38,12 @@ test_1 (int n)
+ ** mov x11, sp
+ ** ...
+ ** sub sp, sp, x13
+-** str p4, \[sp\]
+ ** cbz w0, [^\n]*
++** str p4, \[sp\]
+ ** str p5, \[sp, #1, mul vl\]
+ ** str p6, \[sp, #2, mul vl\]
+ ** ...
+ ** ptrue p0\.b, all
+-** ldr p4, \[sp\]
+ ** addvl sp, sp, #1
+ ** ldr x24, \[sp\], 32
+ ** ret
+Index: gcc-12-12.2.0/src/gcc/testsuite/gcc.dg/rtl/aarch64/pr111411.c
+===================================================================
+--- /dev/null
++++ gcc-12-12.2.0/src/gcc/testsuite/gcc.dg/rtl/aarch64/pr111411.c
+@@ -0,0 +1,57 @@
++/* { dg-do compile { target aarch64*-*-* } } */
++/* { dg-require-effective-target lp64 } */
++/* { dg-options "-O -fdisable-rtl-postreload -fpeephole2 -fno-schedule-fusion" } */
++
++extern int data[];
++
++void __RTL (startwith ("ira")) foo (void *ptr)
++{
++ (function "foo"
++ (param "ptr"
++ (DECL_RTL (reg/v:DI <0> [ ptr ]))
++ (DECL_RTL_INCOMING (reg/v:DI x0 [ ptr ]))
++ ) ;; param "ptr"
++ (insn-chain
++ (block 2
++ (edge-from entry (flags "FALLTHRU"))
++ (cnote 3 [bb 2] NOTE_INSN_BASIC_BLOCK)
++ (insn 4 (set (reg:DI <0>) (reg:DI x0)))
++ (insn 5 (set (reg:DI <1>)
++ (plus:DI (reg:DI <0>) (const_int 768))))
++ (insn 6 (set (mem:SI (plus:DI (reg:DI <0>)
++ (const_int 508)) [1 &data+508 S4 A4])
++ (const_int 0)))
++ (insn 7 (set (mem:SI (plus:DI (reg:DI <1>)
++ (const_int -256)) [1 &data+512 S4 A4])
++ (const_int 0)))
++ (edge-to exit (flags "FALLTHRU"))
++ ) ;; block 2
++ ) ;; insn-chain
++ ) ;; function
++}
++
++void __RTL (startwith ("ira")) bar (void *ptr)
++{
++ (function "bar"
++ (param "ptr"
++ (DECL_RTL (reg/v:DI <0> [ ptr ]))
++ (DECL_RTL_INCOMING (reg/v:DI x0 [ ptr ]))
++ ) ;; param "ptr"
++ (insn-chain
++ (block 2
++ (edge-from entry (flags "FALLTHRU"))
++ (cnote 3 [bb 2] NOTE_INSN_BASIC_BLOCK)
++ (insn 4 (set (reg:DI <0>) (reg:DI x0)))
++ (insn 5 (set (reg:DI <1>)
++ (plus:DI (reg:DI <0>) (const_int 768))))
++ (insn 6 (set (mem:SI (plus:DI (reg:DI <1>)
++ (const_int -256)) [1 &data+512 S4 A4])
++ (const_int 0)))
++ (insn 7 (set (mem:SI (plus:DI (reg:DI <0>)
++ (const_int 508)) [1 &data+508 S4 A4])
++ (const_int 0)))
++ (edge-to exit (flags "FALLTHRU"))
++ ) ;; block 2
++ ) ;; insn-chain
++ ) ;; function
++}
diff -Nru gcc-12-12.2.0/debian/rules.patch gcc-12-12.2.0/debian/rules.patch
--- gcc-12-12.2.0/debian/rules.patch 2022-12-26 16:53:20.000000000 +0100
+++ gcc-12-12.2.0/debian/rules.patch 2023-09-19 21:36:59.000000000 +0200
@@ -245,6 +245,8 @@
debian_patches += libgomp-kfreebsd-testsuite
debian_patches += go-testsuite
+debian_patches += CVE-2023-4039
+
# don't remove, this is regularly overwritten, see PR sanitizer/63958.
#debian_patches += libasan-sparc
Reply to: