Bug#1104460: [regression 6.1.y] discard/TRIM through RAID10 blocking (was: Re: Bug#1104460: linux-image-6.1.0-34-powerpc64le: Discard broken) with RAID10: BUG: kernel tried to execute user page (0) - exploit attempt?
- To: Antoine Beaupré <anarcat@debian.org>, 1104460@bugs.debian.org
- Cc: Moritz Mühlenhoff <jmm@inutil.org>, Yu Kuai <yukuai3@huawei.com>, Melvin Vermeeren <vermeeren@vermwa.re>, Greg Kroah-Hartman <gregkh@linuxfoundation.org>, Coly Li <colyli@kernel.org>, Sasha Levin <sashal@kernel.org>, stable <stable@vger.kernel.org>, regressions@lists.linux.dev
- Subject: Bug#1104460: [regression 6.1.y] discard/TRIM through RAID10 blocking (was: Re: Bug#1104460: linux-image-6.1.0-34-powerpc64le: Discard broken) with RAID10: BUG: kernel tried to execute user page (0) - exploit attempt?
- From: Salvatore Bonaccorso <carnil@debian.org>
- Date: Mon, 5 May 2025 22:36:07 +0200
- Message-id: <[🔎] aBkhNwVVs_KwgQ1a@eldamar.lan>
- Reply-to: Salvatore Bonaccorso <carnil@debian.org>, 1104460@bugs.debian.org
- In-reply-to: <[🔎] 875xiex56v.fsf@angela.anarc.at>
- References: <174602441004.174814.6400502946223473449.reportbug@talos.vermwa.re> <aBJH6Nsh-7Zj55nN@eldamar.lan> <[🔎] aBilQxLZ4MA4Tg8e@pisco.westfalen.local> <[🔎] aBjEf5R7X9GaJg2T@eldamar.lan> <174602441004.174814.6400502946223473449.reportbug@talos.vermwa.re> <[🔎] aBjhHUjtXRotZUVa@eldamar.lan> <174602441004.174814.6400502946223473449.reportbug@talos.vermwa.re> <[🔎] 875xiex56v.fsf@angela.anarc.at> <174602441004.174814.6400502946223473449.reportbug@talos.vermwa.re>
Hi Antoine,
On Mon, May 05, 2025 at 02:50:32PM -0400, Antoine Beaupré wrote:
> On 2025-05-05 18:02:37, Salvatore Bonaccorso wrote:
> > On Mon, May 05, 2025 at 04:00:31PM +0200, Salvatore Bonaccorso wrote:
> >> Hi Moritz,
> >>
> >> On Mon, May 05, 2025 at 01:47:15PM +0200, Moritz Mühlenhoff wrote:
> >> > Am Wed, Apr 30, 2025 at 05:55:20PM +0200 schrieb Salvatore Bonaccorso:
> >> > > Hi
> >> > >
> >> > > We got a regression report in Debian after the update from 6.1.133 to
> >> > > 6.1.135. Melvin is reporting that discard/trimm trhough a RAID10 array
> >> > > stalls idefintively. The full report is inlined below and originates
> >> > > from https://bugs.debian.org/1104460 .
> >> >
> >> > JFTR, we ran into the same problem with a few Wikimedia servers running
> >> > 6.1.135 and RAID 10: The servers started to lock up once fstrim.service
> >> > got started. Full oops messages are available at
> >> > https://phabricator.wikimedia.org/P75746
> >>
> >> Thanks for this aditional datapoints. Assuming you wont be able to
> >> thest the other stable series where the commit d05af90d6218
> >> ("md/raid10: fix missing discard IO accounting") went in, might you at
> >> least be able to test the 6.1.y branch with the commit reverted again
> >> and manually trigger the issue?
> >>
> >> If needed I can provide a test Debian package of 6.1.135 (or 6.1.137)
> >> with the patch reverted.
> >
> > So one additional data point as several Debian users were reporting
> > back beeing affected: One user did upgrade to 6.12.25 (where the
> > commit was backported as well) and is not able to reproduce the issue
> > there.
>
> That would be me.
>
> I can reproduce the issue as outlined by Moritz above fairly reliably in
> 6.1.135 (debian package 6.1.0-34-amd64). The reproducer is simple, on a
> RAID-10 host:
>
> 1. reboot
> 2. systemctl start fstrim.service
>
> We're tracking the issue internally in:
>
> https://gitlab.torproject.org/tpo/tpa/team/-/issues/42146
>
> I've managed to workaround the issue by upgrading to the Debian package
> from testing/unstable (6.12.25), as Salvatore indicated above. There,
> fstrim doesn't cause any crash and completes successfully. In stable, it
> just hangs there forever. The kernel doesn't completely panic and the
> machine is otherwise somewhat still functional: my existing SSH
> connection keeps working, for example, but new ones fail. And an `apt
> install` of another kernel hangs forever.
So likely at least in 6.1.y there are missing pre-requisites causing
the behaviour.
If you can test 6.1.135-1 with the commit
4a05f7ae33716d996c5ce56478a36a3ede1d76f2 reverted then you can fetch
built packages at:
https://people.debian.org/~carnil/tmp/linux/1104460/
Regards,
Salvatore
Reply to: