[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#1019855: Fwd: libc6: immediately crashes with SIGILL on 4th gen Intel Core CPUs (seems related to AVX2 instructions), bricking the whole system



Hi,

On 2022-09-15 01:37, debian-bug-report@p0358.net wrote:
> Package: libc6
> Version: 2.31-13+deb11u4
> Severity: critical
> 
> Dear Maintainer,
> 
> After an upgrade to version +deb11u4 on my system running Haswell
> (4th gen Intel Core) CPU, most of the programs including bash or dpkg
> are immediately crashing with SIGILL. The problem seems to be caused/
> related to AVX2 and changes made to some functions utilizing this
> instruction set. I don't know much about Debian bug reporting, so forgive me
> any mistakes I've made.
> The issue is on both host, LXC and Docker.
> I have described more on this link:
> https://github.com/debuerreotype/docker-debian-artifacts/issues/175
> where I also linked my coredump from example program and described stuff
> more thoroughly.

First of all, sorry about the issue, it should not have slipped in a
stable release. Unfortunately I am not able to reproduce the issue. I
have tried on 3rd gen or 5th gen Intel Core CPUs, but failed to
reproduce it. Therefore I will need your help to understand the issue.

The first thing would be to provide the output of /proc/cpuinfo

> Coredump link directly just in case: https://github.com/debuerreotype/docker-debian-artifacts/files/9569748/core.bash.100000.2663c40e671041e6b40c882a70b83c3f.1480736.1663185824000000.zip

Unfortunately I am not able to use this core dump to get the instruction
that trigger the SIGILL, even after installing debug symbols packages.


> Also log lines from kernel:
> kernel: [834669.721253] traps: dpkg[1455373] trap invalid opcode
> ip:7fa39701951d sp:7ffc4ad26e58 error:0 in libc-2.31.so[7fa396edd000+15a000]
> kernel: [834669.732958] traps: dpkg[1455374] trap invalid opcode
> ip:7f529ca9551d sp:7fffb6f0a238 error:0 in libc-2.31.so[7f529c959000+15a000]
> kernel: [834669.840128] traps: dpkg[1455375] trap invalid opcode
> ip:7f1874cc951d sp:7fffc2c2f5d8 error:0 in libc-2.31.so[7f1874b8d000+15a000]
> kernel: [834669.907918] traps: dpkg[1455378] trap invalid opcode
> ip:7f3b4f8d851d sp:7fff3ec970f8 error:0 in libc-2.31.so[7f3b4f79c000+15a000]
> kernel: [834712.152139] traps: passwd[1455693] trap invalid opcode
> ip:7fefee4b52b7 sp:7ffffcb506b8 error:0 in libc-2.31.so[7fefee37d000+15a000]

Same from there due to ASLR. It seems to fail in at least two different
locations. Do you have some extra lines around, sometimes the kernel
dump the addresses around the instruction pointer?

> Not sure what exactly might be causing the issue, but if these changes
> aren't pulled, potentially anyone with this or similar CPU as me will
> upgrade and end up with bricked system.

The changes that are in this stable release have been (or at least were
supposed to, given the bug you reported) in testing/sid for a few
months. Are you able to do a test with debian sid, for instance in
docker?

> I will proceed to try using `clearcpuid=293` kernel flag myself, but
> consider how many distros depend on Debian, live CDs etc, with people unable
> to figure out why their system became useless, unable to trace the source,
> and blaming it just on Linux...

If you believe the issue is due to AVX2, clearcpuid won't help, as it
just clear the corresponding flags from the kernel point of view, but
the cpuid instruction will just continue to behave the same. The way to
do disable that features at the glibc level is to set the GLIBC_TUNABLES
environment variable to "glibc.cpu.hwcaps=-AVX2_Usable".
 
Regards
Aurelien

-- 
Aurelien Jarno                          GPG: 4096R/1DDD8C9B
aurelien@aurel32.net                 http://www.aurel32.net


Reply to: