[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#916779: libc6-armhf-cross: strerror(-3) sets errno to ENOMEM



On 22.12.18 16:56, Aurelien Jarno wrote:
> On 2018-12-22 16:24, Tim Rühsen wrote:
>> On 22.12.18 13:37, Aurelien Jarno wrote:
>>> On 2018-12-21 12:58, Tim Rühsen wrote:
>>>> On 12/21/18 12:09 PM, Aurelien Jarno wrote:
>>>>> On 2018-12-21 11:51, Tim Rühsen wrote:
>>>>>> On 12/19/18 12:55 AM, Aurelien Jarno wrote:
>>>>>>> On 2018-12-18 22:11, Aurelien Jarno wrote:
>>>>>>>> On 2018-12-18 21:34, Aurelien Jarno wrote:
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> On 2018-12-18 15:15, Tim Ruehsen wrote:
>>>>>>>>>> Package: libc6-armhf-cross
>>>>>>>>>> Version: 2.28-2cross2
>>>>>>>>>> Severity: normal
>>>>>>>>>>
>>>>>>>>>> Dear Maintainer,
>>>>>>>>>>
>>>>>>>>>> currently strerror(-3) sets errno unexpectedly to ENOMEM (12).
>>>>>>>>>>
>>>>>>>>>> The expected errno value would be either EINVAL or not touching errno
>>>>>>>>>> at all.
>>>>>>>>>>
>>>>>>>>>> This behavior is relatively new and causes some CI cross builds to fail.
>>>>>>>>>> The failing test is a gnulib test (test-strerror.c).
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I can reproduce the issue with libc6-armhf-cross 2.28-2cross2 and
>>>>>>>>> qemu-arm-static 1:3.1+dfsg-1, but not with the same binary on real
>>>>>>>>> hardware nor on qemu-user-static 1:2.12+dfsg-3+b1. I would therefore
>>>>>>>>> think it's a qemu bug.
>>>>>>>>
>>>>>>>> Hmm, I am wrong, I can actually reproduce it with qemu-user-static
>>>>>>>> version 1:2.12+dfsg-3+b1. But I can't reproduce it on real hardware.
>>>>>>>
>>>>>>> It seems to have been caused by this upstream patch:
>>>>>>>
>>>>>>> | commit 1294b1892e19d70e9e4dca0a2f3e39497f262a42
>>>>>>> | Author: Wilco Dijkstra <wdijkstr@arm.com>
>>>>>>> | Date:   Thu Mar 15 17:57:03 2018 +0000
>>>>>>> | 
>>>>>>> |     Add support for sqrt asm redirects
>>>>>>> |     
>>>>>>> |     This patch series cleans up the many uses of  __ieee754_sqrt(f/l) in GLIBC.
>>>>>>> |     The goal is to enable GCC to do the inlining, and if this fails call the
>>>>>>> |     __ieee754_sqrt function.  This is done by internally declaring sqrt with asm
>>>>>>> |     redirects.  The compat symbols and sqrt wrappers need to disable the redirect.
>>>>>>> |     The redirect is also disabled if there are already redirects defined when
>>>>>>> |     using -ffinite-math-only.
>>>>>>> |     
>>>>>>> |     All math functions (but not math tests, non-library code and libnldbl) are
>>>>>>> |     built with -fno-math-errno which means GCC will typically inline sqrt as a
>>>>>>> |     single instruction.  This means targets are no longer forced to add a special
>>>>>>> |     inline for sqrt.
>>>>>>> |     
>>>>>>> |             * include/math.h (sqrt): Declare with asm redirect.
>>>>>>> |             (sqrtf): Likewise.
>>>>>>> |             (sqrtl): Likewise.
>>>>>>> |             (sqrtf128): Likewise.
>>>>>>> |             * Makeconfig: Add -fno-math-errno for libc/libm, but build testsuite,
>>>>>>> |             nonlib and libnldbl with -fmath-errno.
>>>>>>> |             * math/w_sqrt_compat.c: Define NO_MATH_REDIRECT.
>>>>>>> |             * math/w_sqrt_template.c: Likewise.
>>>>>>> |             * math/w_sqrtf_compat.c: Likewise.
>>>>>>> |             * math/w_sqrtl_compat.c: Likewise.
>>>>>>> |             * sysdeps/i386/fpu/w_sqrt.c: Likewise.
>>>>>>> |             * sysdeps/i386/fpu/w_sqrt_compat.c: Likewise.
>>>>>>> |             * sysdeps/generic/math-type-macros-float128.h: Remove math.h and
>>>>>>> |             complex.h.
>>>>>>>
>>>>>>> And more precisely by building libc with -fno-math-errno.
>>>>>>
>>>>>> Thanks for looking into it.
>>>>>>
>>>>>> How is the proceeding ? Is there enough info to fix (or report upstream)
>>>>>> ? If not, what has to be done ?
>>>>>
>>>>> No it's not enough to fix it or report it upstream. We still have to
>>>>> understand the exact issue. For me it's not yet clear if the bug is in
>>>>> QEMU or in glibc. The fact that it works fine on real hardware would
>>>>> go towards a QEMU bug, but there is no proof yet.
>>>>
>>>> Looking at glibc's string/strerror.c, it calls __strerror_r() before
>>>> saving errno.
>>>>
>>>> In __strerror_r(), gettext() is being called via a the #define _().
>>>>
>>>> gettext() saves/restores errno only if successful, else it doesn't.
>>>> __strerror_r() doesn't check or save errno at all.
>>>>
>>>> So whenever gettext() sets errno, this value stays when strerror()
>>>> returns. The gettext() code path is only travelled when errnum is < 0.
>>>>
>>>> You can of course argue, if gettext() or strerror() must be fixed. But
>>>> that is clearly an upstream issue.
>>>>
>>>> And if there is an underlying issue with memory allocation is a
>>>> different issue. But is doesn't affect the strerror() function in the
>>>> gnulib test as it seems.
>>>
>>> This is not what happens, errno is not set to ENOMEM in strerror_() but
>>> in strerror(). The problem is that the malloc implementation when run
>>> under QEMU sets errno to ENOMEM, despite successfully allocating the
>>> memory. errno is supposed to be saved around the malloc call:
>>>
>>>   saved_errno = errno;
>>>   if (buf == NULL) 
>>>     buf = malloc (1024);
>>>   __set_errno (saved_errno); 
>>>
>>> That said, when compiled with -fno-math-errno, GCC optimizes out
>>> saving and restoring errno around the malloc call. I am not sure if this
>>> is a GCC bug or a bug in the GCC documentation.
>>>
>>> Note that the fact that malloc() successfully allocates memory but still
>>> sets errno to ENOMEM might also be due to the use of -fno-math-errno.
>>
>> You are right, not clearly an upstream thing.
>>
>> Just one more to add, I just stumbled upon.
>>
>> Add a 'printf("errno=%d\n",errno);` before 'errno=0;' and the result is
>> fine. Exchanging the two lines and we see the issue again.
>>
>> Order of libraries that become lazy loaded ? Memory mapping ? Two errno
>> variables ?
> 
> The problem is that qemu-arm does not provide a heap to the program, so
> glibc fails to alloc memory through brk. This causes malloc to switch to
> mmap based memory allocation, and this also sets errno to ENOMEM.
> 
> printf also calls malloc, so the malloc implementation switches to
> mmap based memory allocation at this moment. This is remembered through
> the life of the program. When strerror then calls malloc(1024), the
> allocation is done directly through mmap and errno is not set to ENOMEM.
> That's why you do not see the issue.
> 
> To reproduce the issue, you therefore need the following conditions:
> - The kernel or QEMU does not provide a heap to the program
> - malloc is not called (implicitly or explicitly) before the call to
>   strerror
> - strerror is called with an invalid error number.
> 
> If all of this 3 conditions are not met, the bug does not appear.

That is a good explanation and makes sense to me, thank you, Aurelien.

At least we can work around that issue now.

BTW, how do you debug cross-compiled executables ? There is no cross-gdb
packaged in debian (or is there ?). Building that from scratch seems too
time-intensive...

Regards, Tim

Attachment: signature.asc
Description: OpenPGP digital signature


Reply to: