Re: [Precise][CVE-2014-9090] x86_64, traps: Stop using IST for #SS

To: Ben Hutchings <ben@decadent.org.uk>
Cc: kernel-team@lists.ubuntu.com, Willy Tarreau <w@1wt.eu>, Moritz Muehlenhoff <jmm@debian.org>, debian-kernel@lists.debian.org, debian-lts@lists.debian.org
Subject: Re: [Precise][CVE-2014-9090] x86_64, traps: Stop using IST for #SS
From: Luis Henriques <luis.henriques@canonical.com>
Date: Mon, 8 Dec 2014 12:01:28 +0000
Message-id: <[🔎] 20141208120128.GE7491@hercules>
In-reply-to: <[🔎] 1417988613.11886.43.camel@decadent.org.uk>
References: <[🔎] 1417787507-15347-1-git-send-email-luis.henriques@canonical.com> <[🔎] 1417787507-15347-3-git-send-email-luis.henriques@canonical.com> <[🔎] 1417988613.11886.43.camel@decadent.org.uk>

On Sun, Dec 07, 2014 at 09:43:33PM +0000, Ben Hutchings wrote:
> I think you want these too:
> 
> af726f21ed8a x86_64, traps: Fix the espfix64 #DF fixup and rewrite it in C
> b645af2d5905 x86_64, traps: Rework bad_iret
> 
> I'm attaching backports to 3.2.
> 

Thanks Ben.  Initially Andy asked to wait 1 or 2 weeks before queuing
these 2 patches for stable kernels, but I guess it should now be OK
to add them.

Cheers,
--
Luís

> Ben.
> 
> -- 
> Ben Hutchings
> Experience is directly proportional to the value of equipment destroyed.
>                                                          - Carolyn Scheppner

> From: Andy Lutomirski <luto@amacapital.net>
> Date: Sat, 22 Nov 2014 18:00:33 -0800
> Subject: x86_64, traps: Rework bad_iret
> 
> commit b645af2d5905c4e32399005b867987919cbfc3ae upstream.
> 
> It's possible for iretq to userspace to fail.  This can happen because
> of a bad CS, SS, or RIP.
> 
> Historically, we've handled it by fixing up an exception from iretq to
> land at bad_iret, which pretends that the failed iret frame was really
> the hardware part of #GP(0) from userspace.  To make this work, there's
> an extra fixup to fudge the gs base into a usable state.
> 
> This is suboptimal because it loses the original exception.  It's also
> buggy because there's no guarantee that we were on the kernel stack to
> begin with.  For example, if the failing iret happened on return from an
> NMI, then we'll end up executing general_protection on the NMI stack.
> This is bad for several reasons, the most immediate of which is that
> general_protection, as a non-paranoid idtentry, will try to deliver
> signals and/or schedule from the wrong stack.
> 
> This patch throws out bad_iret entirely.  As a replacement, it augments
> the existing swapgs fudge into a full-blown iret fixup, mostly written
> in C.  It's should be clearer and more correct.
> 
> Signed-off-by: Andy Lutomirski <luto@amacapital.net>
> Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
> Cc: stable@vger.kernel.org
> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
> [bwh: Backported to 3.2: we didn't use the _ASM_EXTABLE macro]
> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
> ---
> --- a/arch/x86/kernel/entry_64.S
> +++ b/arch/x86/kernel/entry_64.S
> @@ -875,12 +875,14 @@ ENTRY(native_iret)
>  
>  .global native_irq_return_iret
>  native_irq_return_iret:
> +	/*
> +	 * This may fault.  Non-paranoid faults on return to userspace are
> +	 * handled by fixup_bad_iret.  These include #SS, #GP, and #NP.
> +	 * Double-faults due to espfix64 are handled in do_double_fault.
> +	 * Other faults here are fatal.
> +	 */
>  	iretq
>  
> -	.section __ex_table,"a"
> -	.quad native_irq_return_iret, bad_iret
> -	.previous
> -
>  #ifdef CONFIG_X86_ESPFIX64
>  native_irq_return_ldt:
>  	pushq_cfi %rax
> @@ -907,25 +909,6 @@ native_irq_return_ldt:
>  	jmp native_irq_return_iret
>  #endif
>  
> -	.section .fixup,"ax"
> -bad_iret:
> -	/*
> -	 * The iret traps when the %cs or %ss being restored is bogus.
> -	 * We've lost the original trap vector and error code.
> -	 * #GPF is the most likely one to get for an invalid selector.
> -	 * So pretend we completed the iret and took the #GPF in user mode.
> -	 *
> -	 * We are now running with the kernel GS after exception recovery.
> -	 * But error_entry expects us to have user GS to match the user %cs,
> -	 * so swap back.
> -	 */
> -	pushq $0
> -
> -	SWAPGS
> -	jmp general_protection
> -
> -	.previous
> -
>  	/* edi: workmask, edx: work */
>  retint_careful:
>  	CFI_RESTORE_STATE
> @@ -1463,16 +1446,15 @@ error_sti:
>  
>  /*
>   * There are two places in the kernel that can potentially fault with
> - * usergs. Handle them here. The exception handlers after iret run with
> - * kernel gs again, so don't set the user space flag. B stepping K8s
> - * sometimes report an truncated RIP for IRET exceptions returning to
> - * compat mode. Check for these here too.
> + * usergs. Handle them here.  B stepping K8s sometimes report a
> + * truncated RIP for IRET exceptions returning to compat mode. Check
> + * for these here too.
>   */
>  error_kernelspace:
>  	incl %ebx
>  	leaq native_irq_return_iret(%rip),%rcx
>  	cmpq %rcx,RIP+8(%rsp)
> -	je error_swapgs
> +	je error_bad_iret
>  	movl %ecx,%eax	/* zero extend */
>  	cmpq %rax,RIP+8(%rsp)
>  	je bstep_iret
> @@ -1483,7 +1465,15 @@ error_kernelspace:
>  bstep_iret:
>  	/* Fix truncated RIP */
>  	movq %rcx,RIP+8(%rsp)
> -	jmp error_swapgs
> +	/* fall through */
> +
> +error_bad_iret:
> +	SWAPGS
> +	mov %rsp,%rdi
> +	call fixup_bad_iret
> +	mov %rax,%rsp
> +	decl %ebx	/* Return to usergs */
> +	jmp error_sti
>  	CFI_ENDPROC
>  END(error_entry)
>  
> --- a/arch/x86/kernel/traps.c
> +++ b/arch/x86/kernel/traps.c
> @@ -363,6 +363,35 @@ asmlinkage __kprobes struct pt_regs *syn
>  		*regs = *eregs;
>  	return regs;
>  }
> +
> +struct bad_iret_stack {
> +	void *error_entry_ret;
> +	struct pt_regs regs;
> +};
> +
> +asmlinkage
> +struct bad_iret_stack *fixup_bad_iret(struct bad_iret_stack *s)
> +{
> +	/*
> +	 * This is called from entry_64.S early in handling a fault
> +	 * caused by a bad iret to user mode.  To handle the fault
> +	 * correctly, we want move our stack frame to task_pt_regs
> +	 * and we want to pretend that the exception came from the
> +	 * iret target.
> +	 */
> +	struct bad_iret_stack *new_stack =
> +		container_of(task_pt_regs(current),
> +			     struct bad_iret_stack, regs);
> +
> +	/* Copy the IRET target to the new stack. */
> +	memmove(&new_stack->regs.ip, (void *)s->regs.sp, 5*8);
> +
> +	/* Copy the remainder of the stack from the current stack. */
> +	memmove(new_stack, s, offsetof(struct bad_iret_stack, regs.ip));
> +
> +	BUG_ON(!user_mode_vm(&new_stack->regs));
> +	return new_stack;
> +}
>  #endif
>  
>  /*

> From: Andy Lutomirski <luto@amacapital.net>
> Date: Sat, 22 Nov 2014 18:00:31 -0800
> Subject: x86_64, traps: Fix the espfix64 #DF fixup and rewrite it in C
> 
> commit af726f21ed8af2cdaa4e93098dc211521218ae65 upstream.
> 
> There's nothing special enough about the espfix64 double fault fixup to
> justify writing it in assembly.  Move it to C.
> 
> This also fixes a bug: if the double fault came from an IST stack, the
> old asm code would return to a partially uninitialized stack frame.
> 
> Fixes: 3891a04aafd668686239349ea58f3314ea2af86b
> Signed-off-by: Andy Lutomirski <luto@amacapital.net>
> Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
> Cc: stable@vger.kernel.org
> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
> [bwh: Backported to 3.2:
>  - Keep using the paranoiderrorentry macro to generate the asm code
>  - Adjust context]
> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
> ---
>  arch/x86/kernel/entry_64.S | 34 ++--------------------------------
>  arch/x86/kernel/traps.c    | 24 ++++++++++++++++++++++++
>  2 files changed, 26 insertions(+), 32 deletions(-)
> 
> --- a/arch/x86/kernel/entry_64.S
> +++ b/arch/x86/kernel/entry_64.S
> @@ -873,6 +873,7 @@ ENTRY(native_iret)
>  	jnz native_irq_return_ldt
>  #endif
>  
> +.global native_irq_return_iret
>  native_irq_return_iret:
>  	iretq
>  
> @@ -972,37 +973,6 @@ ENTRY(retint_kernel)
>  	CFI_ENDPROC
>  END(common_interrupt)
>  
> -	/*
> -	 * If IRET takes a fault on the espfix stack, then we
> -	 * end up promoting it to a doublefault.  In that case,
> -	 * modify the stack to make it look like we just entered
> -	 * the #GP handler from user space, similar to bad_iret.
> -	 */
> -#ifdef CONFIG_X86_ESPFIX64
> -	ALIGN
> -__do_double_fault:
> -	XCPT_FRAME 1 RDI+8
> -	movq RSP(%rdi),%rax		/* Trap on the espfix stack? */
> -	sarq $PGDIR_SHIFT,%rax
> -	cmpl $ESPFIX_PGD_ENTRY,%eax
> -	jne do_double_fault		/* No, just deliver the fault */
> -	cmpl $__KERNEL_CS,CS(%rdi)
> -	jne do_double_fault
> -	movq RIP(%rdi),%rax
> -	cmpq $native_irq_return_iret,%rax
> -	jne do_double_fault		/* This shouldn't happen... */
> -	movq PER_CPU_VAR(kernel_stack),%rax
> -	subq $(6*8-KERNEL_STACK_OFFSET),%rax	/* Reset to original stack */
> -	movq %rax,RSP(%rdi)
> -	movq $0,(%rax)			/* Missing (lost) #GP error code */
> -	movq $general_protection,RIP(%rdi)
> -	retq
> -	CFI_ENDPROC
> -END(__do_double_fault)
> -#else
> -# define __do_double_fault do_double_fault
> -#endif
> -
>  /*
>   * End of kprobes section
>   */
> @@ -1169,7 +1139,7 @@ zeroentry overflow do_overflow
>  zeroentry bounds do_bounds
>  zeroentry invalid_op do_invalid_op
>  zeroentry device_not_available do_device_not_available
> -paranoiderrorentry double_fault __do_double_fault
> +paranoiderrorentry double_fault do_double_fault
>  zeroentry coprocessor_segment_overrun do_coprocessor_segment_overrun
>  errorentry invalid_TSS do_invalid_TSS
>  errorentry segment_not_present do_segment_not_present
> --- a/arch/x86/kernel/traps.c
> +++ b/arch/x86/kernel/traps.c
> @@ -224,6 +224,30 @@ dotraplinkage void do_double_fault(struc
>  	static const char str[] = "double fault";
>  	struct task_struct *tsk = current;
>  
> +#ifdef CONFIG_X86_ESPFIX64
> +	extern unsigned char native_irq_return_iret[];
> +
> +	/*
> +	 * If IRET takes a non-IST fault on the espfix64 stack, then we
> +	 * end up promoting it to a doublefault.  In that case, modify
> +	 * the stack to make it look like we just entered the #GP
> +	 * handler from user space, similar to bad_iret.
> +	 */
> +	if (((long)regs->sp >> PGDIR_SHIFT) == ESPFIX_PGD_ENTRY &&
> +		regs->cs == __KERNEL_CS &&
> +		regs->ip == (unsigned long)native_irq_return_iret)
> +	{
> +		struct pt_regs *normal_regs = task_pt_regs(current);
> +
> +		/* Fake a #GP(0) from userspace. */
> +		memmove(&normal_regs->ip, (void *)regs->sp, 5*8);
> +		normal_regs->orig_ax = 0;  /* Missing (lost) #GP error code */
> +		regs->ip = (unsigned long)general_protection;
> +		regs->sp = (unsigned long)&normal_regs->orig_ax;
> +		return;
> +	}
> +#endif
> +
>  	/* Return not checked because double check cannot be ignored */
>  	notify_die(DIE_TRAP, str, regs, error_code, X86_TRAP_DF, SIGSEGV);
>

Reply to:

References:
- [CVE-2014-9090] x86_64, traps: Stop using IST for #SS
  - From: Luis Henriques <luis.henriques@canonical.com>
- [Precise][CVE-2014-9090] x86_64, traps: Stop using IST for #SS
  - From: Luis Henriques <luis.henriques@canonical.com>
- Re: [Precise][CVE-2014-9090] x86_64, traps: Stop using IST for #SS
  - From: Ben Hutchings <ben@decadent.org.uk>

Prev by Date: Re: [CVE-2014-9090] x86_64, traps: Stop using IST for #SS
Next by Date: Re: [CVE-2014-9090] x86_64, traps: Stop using IST for #SS
Previous by thread: Re: [Precise][CVE-2014-9090] x86_64, traps: Stop using IST for #SS
Next by thread: Re: [CVE-2014-9090] x86_64, traps: Stop using IST for #SS
Index(es):
- Date
- Thread