Bug#924891: glibc: misc/tst-pkey fails due to cleared PKRU register after signal in amd64 32-bit compat mode

To: 924891@bugs.debian.org
Subject: Bug#924891: glibc: misc/tst-pkey fails due to cleared PKRU register after signal in amd64 32-bit compat mode
From: Aurelien Jarno <aurelien@aurel32.net>
Date: Sun, 2 Aug 2020 14:14:38 +0200
Message-id: <[🔎] 20200802121438.GA3724539@aurel32.net>
Reply-to: Aurelien Jarno <aurelien@aurel32.net>, 924891@bugs.debian.org
In-reply-to: <20190328210558.GB5515@aurel32.net>
References: <20190317175722.qgsm6uvonexs35ot@xanadu.blop.info> <87h8buna6l.fsf@mid.deneb.enyo.de> <20190326221047.GA5515@aurel32.net> <20190327073750.pf54ni4wjkd4zu4r@xanadu.blop.info> <87sgv8oj02.fsf@mid.deneb.enyo.de> <20190327210034.gn4svip5e5zqdaui@xanadu.blop.info> <20190317175722.qgsm6uvonexs35ot@xanadu.blop.info> <871s2rq5yd.fsf@mid.deneb.enyo.de> <20190317175722.qgsm6uvonexs35ot@xanadu.blop.info> <20190328210558.GB5515@aurel32.net> <20190317175722.qgsm6uvonexs35ot@xanadu.blop.info>

control: reassign -1 src:linux
control: retitle -1 linux: cleared PKRU register after signal in amd64 32-bit compat mode 
control: close -1 4.10-1~exp1
control: affects -1 glibc

Hi,

On 2019-03-28 22:05, Aurelien Jarno wrote:
> Hi Florian,
> 
> On 2019-03-27 23:59, Florian Weimer wrote:
> > retitle 924891 glibc: misc/tst-pkey fails due to cleared PKRU register after signal in amd64 32-bit compat mode 
> > thanks
> > 
> > * Lucas Nussbaum:
> > 
> > > On 27/03/19 at 08:48 +0100, Florian Weimer wrote:
> > >> > If that's useful, I can easily provide access to an AWS VM to debug this
> > >> > issue.
> > >> 
> > >> Oh, that would be quite helpful indeed.
> > >
> > > Can you send your SSH key? (I thought there was a way to get the SSH key
> > > for a DD, but I cannot find it anymore)
> > >
> > > Then you will be able to ssh to root@18.184.55.40.
> > > There's sbuild and schroot setup on the VM.
> > >
> > > When you are done, please 'poweroff' the machine, which will terminate
> > > it.
> > 
> > The issue reproduces outside the chroot, with the stretch userland.
> > 
> > What happens is that once we get out of the SIGUSR1 signal handler,
> > the PKRU register has value zero.  This happens around this code in
> > the test:
> > 
> >   /* Check that in a signal handler, there is no access.  */
> >   xsignal (SIGUSR1, &sigusr1_handler);
> >   xraise (SIGUSR1);
> >   xsignal (SIGUSR1, SIG_DFL);
> >   TEST_COMPARE (sigusr1_handler_ran, 1);
> > 
> > I checked the following (via a breakpoint in pkey_get; I don't think
> > GDB can read the PKRU register directly): Inside the SIGUSR1 signal
> > handler, PKRU has value 0x55555554, as expected for this kernel, but
> > after the return, we get zero.  This is the first time a signal is
> > delivered on the main thread, so it's consistent with fairly broken
> > signal handling as far as the PKRU register is concerned.  I guess
> > clearing PKRU in this way might even constitute a minor security bug
> > (because the zero value means no restrictions).
> 
> Thanks a lot for investigating and for all the details.
> 
> > This commit looks highly relevant:
> > 
> > commit a4455082dc6f0b5d51a23523f77600e8ede47c79
> > Author: Dave Hansen <dave.hansen@linux.intel.com>
> > Date:   Wed Jun 8 10:25:33 2016 -0700
> > 
> >     x86/signals: Add missing signal_compat code for x86 features
> >     
> >     The 32-bit siginfo is a different binary format than the 64-bit
> >     one.  So, when running 32-bit binaries on 64-bit kernels, we have
> >     to convert the kernel's 64-bit version to a 32-bit version that
> >     userspace can grok.
> > 
> > If the siginfo_t layout is incorrect (with regards to what the
> > hardware writes), I expect that we might end up copying back the wrong
> > PKRU value.
> 
> This commit is actually already in the 4.9 kernel.
> 
> > I'm not sure what to do here.  This really looks like a kernel bug.
> > Maybe we should just verify that this is fixed in the buster kernel
> > and move on?
> 
> I agree. I have been able to find a machine where I can temporarily run
> a VM. I have found that the problem has been solved between kernel
> 4.10-rc6 and 4.10, more precisely between the following debian packages:
> - linux-image-4.10.0-rc6-amd64-unsigned version 4.10~rc6-1~exp1
> - linux-image-4.10.0-trunk-amd64-unsigned version 4.10-1~exp1
> 
> I gave a quick look at the commit logs, and I haven't identified a
> commit. I'll look again and try to identify the commit fixing the issue
> so that it can be backported in the stretch kernel. I'll then reassign
> the bug there.

I have identified that the issue has been fixed by the following commit
that went into kernel 4.10-rc7:

commit dffba9a31c7769be3231c420d4b364c92ba3f1ac
Author: Yu-cheng Yu <yu-cheng.yu@intel.com>
Date:   Mon Jan 23 14:54:44 2017 -0800

    x86/fpu/xstate: Fix xcomp_bv in XSAVES header
    
    The compacted-format XSAVES area is determined at boot time and
    never changed after.  The field xsave.header.xcomp_bv indicates
    which components are in the fixed XSAVES format.
    
    In fpstate_init() we did not set xcomp_bv to reflect the XSAVES
    format since at the time there is no valid data.
    
    However, after we do copy_init_fpstate_to_fpregs() in fpu__clear(),
    as in commit:
    
      b22cbe404a9c x86/fpu: Fix invalid FPU ptrace state after execve()
    
    and when __fpu_restore_sig() does fpu__restore() for a COMPAT-mode
    app, a #GP occurs.  This can be easily triggered by doing valgrind on
    a COMPAT-mode "Hello World," as reported by Joakim Tjernlund and
    others:
    
            https://bugzilla.kernel.org/show_bug.cgi?id=190061
    
    Fix it by setting xcomp_bv correctly.
    
    This patch also moves the xcomp_bv initialization to the proper
    place, which was in copyin_to_xsaves() as of:
    
      4c833368f0bf x86/fpu: Set the xcomp_bv when we fake up a XSAVES area
    
    which fixed the bug too, but it's more efficient and cleaner to
    initialize things once per boot, not for every signal handling
    operation.
    
    Reported-by: Kevin Hao <haokexin@gmail.com>
    Reported-by: Joakim Tjernlund <Joakim.Tjernlund@infinera.com>
    Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
    Cc: Andy Lutomirski <luto@kernel.org>
    Cc: Borislav Petkov <bp@suse.de>
    Cc: Dave Hansen <dave.hansen@linux.intel.com>
    Cc: Fenghua Yu <fenghua.yu@intel.com>
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Ravi V. Shankar <ravi.v.shankar@intel.com>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: haokexin@gmail.com
    Link: http://lkml.kernel.org/r/1485212084-4418-1-git-send-email-yu-cheng.yu@intel.com
    [ Combined it with 4c833368f0bf. ]
    Signed-off-by: Ingo Molnar <mingo@kernel.org>

This actually makes sense given that on x86 the protection keys register is
saved and restored using XSAVE.

I am therefore reassigning the bug to the kernel, and closing it given it has
been fixed a lot of time ago.

-- 
Aurelien Jarno                          GPG: 4096R/1DDD8C9B
aurelien@aurel32.net                 http://www.aurel32.net

Reply to:

Next by Date: Processed: Bug#924891: glibc: misc/tst-pkey fails due to cleared PKRU register after signal in amd64 32-bit compat mode
Next by thread: Processed: Bug#924891: glibc: misc/tst-pkey fails due to cleared PKRU register after signal in amd64 32-bit compat mode
Index(es):
- Date
- Thread