On 22.12.18 16:56, Aurelien Jarno wrote: > On 2018-12-22 16:24, Tim Rühsen wrote: >> On 22.12.18 13:37, Aurelien Jarno wrote: >>> On 2018-12-21 12:58, Tim Rühsen wrote: >>>> On 12/21/18 12:09 PM, Aurelien Jarno wrote: >>>>> On 2018-12-21 11:51, Tim Rühsen wrote: >>>>>> On 12/19/18 12:55 AM, Aurelien Jarno wrote: >>>>>>> On 2018-12-18 22:11, Aurelien Jarno wrote: >>>>>>>> On 2018-12-18 21:34, Aurelien Jarno wrote: >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> On 2018-12-18 15:15, Tim Ruehsen wrote: >>>>>>>>>> Package: libc6-armhf-cross >>>>>>>>>> Version: 2.28-2cross2 >>>>>>>>>> Severity: normal >>>>>>>>>> >>>>>>>>>> Dear Maintainer, >>>>>>>>>> >>>>>>>>>> currently strerror(-3) sets errno unexpectedly to ENOMEM (12). >>>>>>>>>> >>>>>>>>>> The expected errno value would be either EINVAL or not touching errno >>>>>>>>>> at all. >>>>>>>>>> >>>>>>>>>> This behavior is relatively new and causes some CI cross builds to fail. >>>>>>>>>> The failing test is a gnulib test (test-strerror.c). >>>>>>>>>> >>>>>>>>> >>>>>>>>> I can reproduce the issue with libc6-armhf-cross 2.28-2cross2 and >>>>>>>>> qemu-arm-static 1:3.1+dfsg-1, but not with the same binary on real >>>>>>>>> hardware nor on qemu-user-static 1:2.12+dfsg-3+b1. I would therefore >>>>>>>>> think it's a qemu bug. >>>>>>>> >>>>>>>> Hmm, I am wrong, I can actually reproduce it with qemu-user-static >>>>>>>> version 1:2.12+dfsg-3+b1. But I can't reproduce it on real hardware. >>>>>>> >>>>>>> It seems to have been caused by this upstream patch: >>>>>>> >>>>>>> | commit 1294b1892e19d70e9e4dca0a2f3e39497f262a42 >>>>>>> | Author: Wilco Dijkstra <wdijkstr@arm.com> >>>>>>> | Date: Thu Mar 15 17:57:03 2018 +0000 >>>>>>> | >>>>>>> | Add support for sqrt asm redirects >>>>>>> | >>>>>>> | This patch series cleans up the many uses of __ieee754_sqrt(f/l) in GLIBC. >>>>>>> | The goal is to enable GCC to do the inlining, and if this fails call the >>>>>>> | __ieee754_sqrt function. This is done by internally declaring sqrt with asm >>>>>>> | redirects. The compat symbols and sqrt wrappers need to disable the redirect. >>>>>>> | The redirect is also disabled if there are already redirects defined when >>>>>>> | using -ffinite-math-only. >>>>>>> | >>>>>>> | All math functions (but not math tests, non-library code and libnldbl) are >>>>>>> | built with -fno-math-errno which means GCC will typically inline sqrt as a >>>>>>> | single instruction. This means targets are no longer forced to add a special >>>>>>> | inline for sqrt. >>>>>>> | >>>>>>> | * include/math.h (sqrt): Declare with asm redirect. >>>>>>> | (sqrtf): Likewise. >>>>>>> | (sqrtl): Likewise. >>>>>>> | (sqrtf128): Likewise. >>>>>>> | * Makeconfig: Add -fno-math-errno for libc/libm, but build testsuite, >>>>>>> | nonlib and libnldbl with -fmath-errno. >>>>>>> | * math/w_sqrt_compat.c: Define NO_MATH_REDIRECT. >>>>>>> | * math/w_sqrt_template.c: Likewise. >>>>>>> | * math/w_sqrtf_compat.c: Likewise. >>>>>>> | * math/w_sqrtl_compat.c: Likewise. >>>>>>> | * sysdeps/i386/fpu/w_sqrt.c: Likewise. >>>>>>> | * sysdeps/i386/fpu/w_sqrt_compat.c: Likewise. >>>>>>> | * sysdeps/generic/math-type-macros-float128.h: Remove math.h and >>>>>>> | complex.h. >>>>>>> >>>>>>> And more precisely by building libc with -fno-math-errno. >>>>>> >>>>>> Thanks for looking into it. >>>>>> >>>>>> How is the proceeding ? Is there enough info to fix (or report upstream) >>>>>> ? If not, what has to be done ? >>>>> >>>>> No it's not enough to fix it or report it upstream. We still have to >>>>> understand the exact issue. For me it's not yet clear if the bug is in >>>>> QEMU or in glibc. The fact that it works fine on real hardware would >>>>> go towards a QEMU bug, but there is no proof yet. >>>> >>>> Looking at glibc's string/strerror.c, it calls __strerror_r() before >>>> saving errno. >>>> >>>> In __strerror_r(), gettext() is being called via a the #define _(). >>>> >>>> gettext() saves/restores errno only if successful, else it doesn't. >>>> __strerror_r() doesn't check or save errno at all. >>>> >>>> So whenever gettext() sets errno, this value stays when strerror() >>>> returns. The gettext() code path is only travelled when errnum is < 0. >>>> >>>> You can of course argue, if gettext() or strerror() must be fixed. But >>>> that is clearly an upstream issue. >>>> >>>> And if there is an underlying issue with memory allocation is a >>>> different issue. But is doesn't affect the strerror() function in the >>>> gnulib test as it seems. >>> >>> This is not what happens, errno is not set to ENOMEM in strerror_() but >>> in strerror(). The problem is that the malloc implementation when run >>> under QEMU sets errno to ENOMEM, despite successfully allocating the >>> memory. errno is supposed to be saved around the malloc call: >>> >>> saved_errno = errno; >>> if (buf == NULL) >>> buf = malloc (1024); >>> __set_errno (saved_errno); >>> >>> That said, when compiled with -fno-math-errno, GCC optimizes out >>> saving and restoring errno around the malloc call. I am not sure if this >>> is a GCC bug or a bug in the GCC documentation. >>> >>> Note that the fact that malloc() successfully allocates memory but still >>> sets errno to ENOMEM might also be due to the use of -fno-math-errno. >> >> You are right, not clearly an upstream thing. >> >> Just one more to add, I just stumbled upon. >> >> Add a 'printf("errno=%d\n",errno);` before 'errno=0;' and the result is >> fine. Exchanging the two lines and we see the issue again. >> >> Order of libraries that become lazy loaded ? Memory mapping ? Two errno >> variables ? > > The problem is that qemu-arm does not provide a heap to the program, so > glibc fails to alloc memory through brk. This causes malloc to switch to > mmap based memory allocation, and this also sets errno to ENOMEM. > > printf also calls malloc, so the malloc implementation switches to > mmap based memory allocation at this moment. This is remembered through > the life of the program. When strerror then calls malloc(1024), the > allocation is done directly through mmap and errno is not set to ENOMEM. > That's why you do not see the issue. > > To reproduce the issue, you therefore need the following conditions: > - The kernel or QEMU does not provide a heap to the program > - malloc is not called (implicitly or explicitly) before the call to > strerror > - strerror is called with an invalid error number. > > If all of this 3 conditions are not met, the bug does not appear. That is a good explanation and makes sense to me, thank you, Aurelien. At least we can work around that issue now. BTW, how do you debug cross-compiled executables ? There is no cross-gdb packaged in debian (or is there ?). Building that from scratch seems too time-intensive... Regards, Tim
Attachment:
signature.asc
Description: OpenPGP digital signature