Bug#963493: Repeatable hard lockup running strace testsuite on 4.19.98+ onwards
On Fri, Jun 26, 2020 at 12:35:58PM +0100, Steve McIntyre wrote:
> Hi folks,
>
> I'm the maintainer in Debian for strace. Trying to reproduce
> https://bugs.debian.org/963462 on my machine (Thinkpad T470), I've
> found a repeatable hard lockup running the strace testsuite. Each time
> it seems to have failed in a slightly different place in the testsuite
> (suggesting it's not one particular syscall test that's triggering the
> failure). I initially found this using Debian's current Buster kernel
> (4.19.118+2+deb10u1), then backtracking I found that 4.19.98+1+deb10u1
> worked fine.
>
> I've bisected to find the failure point along the linux-4.19.y stable
> branch and what I've got to is the following commit:
>
> e58f543fc7c0926f31a49619c1a3648e49e8d233 is the first bad commit
> commit e58f543fc7c0926f31a49619c1a3648e49e8d233
> Author: Jann Horn <jannh@google.com>
> Date: Thu Sep 13 18:12:09 2018 +0200
>
> apparmor: don't try to replace stale label in ptrace access check
>
> [ Upstream commit 1f8266ff58840d698a1e96d2274189de1bdf7969 ]
>
> As a comment above begin_current_label_crit_section() explains,
> begin_current_label_crit_section() must run in sleepable context because
> when label_is_stale() is true, aa_replace_current_label() runs, which uses
> prepare_creds(), which can sleep.
> Until now, the ptrace access check (which runs with a task lock held)
> violated this rule.
>
> Also add a might_sleep() assertion to begin_current_label_crit_section(),
> because asserts are less likely to be ignored than comments.
>
> Fixes: b2d09ae449ced ("apparmor: move ptrace checks to using labels")
> Signed-off-by: Jann Horn <jannh@google.com>
> Signed-off-by: John Johansen <john.johansen@canonical.com>
> Signed-off-by: Sasha Levin <sashal@kernel.org>
>
> :040000 040000 ca92f885a38c1747b812116f19de6967084a647e 865a227665e460e159502f21e8a16e6fa590bf50 M security
>
> Considering I'm running strace build tests to provoke this bug,
> finding the failure in a commit talking about ptrace changes does look
> very suspicious...!
>
> Annoyingly, I can't reproduce this on my disparate other machines
> here, suggesting it's maybe(?) timing related.
>
> Hope this helps - happy to give more information, test things, etc.
So if you just revert this one patch, all works well?
I've added the authors of the patch to the cc: list...
Also, does this problem happen on Linus's tree?
thanks,
greg k-h
Reply to: