[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#473812: marked as done (libc6: calloc returns non-zero memory areas when mlockall is being used)



Your message dated Fri, 6 Jun 2014 21:32:21 +0200
with message-id <20140606193221.GA31256@volta.rr44.fr>
and subject line Re: libc6: calloc returns non-zero memory areas when mlockall is being used
has caused the Debian Bug report #473812,
regarding libc6: calloc returns non-zero memory areas when mlockall is being used
to be marked as done.

This means that you claim that the problem has been dealt with.
If this is not the case it is now your responsibility to reopen the
Bug report if necessary, and/or fix the problem forthwith.

(NB: If you are a system administrator and have no idea what this
message is talking about, this may indicate a serious mail system
misconfiguration somewhere. Please contact owner@bugs.debian.org
immediately.)


-- 
473812: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=473812
Debian Bug Tracking System
Contact owner@bugs.debian.org with problems
--- Begin Message ---
Package: libc6
Version: 2.7-5
Severity: normal


Hi!

The bug I found (if it is a bug) is very hard to reproduce for me, so
bear with me if the explanation is a bit sketchy (a glibc-malloc expert
would need to look at this in more detail). Please also note that I have
sticthed together the examples from multipel debugging runs, so the
addresses do not neccessarily match.

Findings of fact:

   1. calloc returns memory areas that contain data from previous allocations
      (typical example:

         0x2aaab01c6fc0: 0 '\0'  0 '\0'  0 '\0'  0 '\0'  0 '\0'  0 '\0'  0 '\0'  0 '\0'
         0x2aaab01c6fc8: -56 'È' -20 'ì' 26 '\032'       5 '\005'        0 '\0'  0 '\0'  0 '\0'  0 '\0'
         0x2aaab01c6fd0: 13 '\r' 0 '\0'  0 '\0'  0 '\0'  4 '\004'        0 '\0'  0 '\0'  0 '\0'
         0x2aaab01c6fd8: 0 '\0'  0 '\0'  0 '\0'  0 '\0'  0 '\0'  0 '\0'  0 '\0'  0 '\0'
         0x2aaab01c6fe0: 0 '\0'  0 '\0'  0 '\0'  0 '\0'  0 '\0'  0 '\0'  0 '\0'  0 '\0'
         0x2aaab01c6fe8: -80 '°' -82 '®' -81 '¯' 2 '\002'        0 '\0'  0 '\0'  0 '\0'  0 '\0'
         0x2aaab01c6ff0: -16 'ð' 108 'l' 28 '\034'       -80 '°' -86 'ª' 42 '*'  0 '\0'  0 '\0'
         0x2aaab01c6ff8: 0 '\0'  0 '\0'  0 '\0'  0 '\0'  0 '\0'  0 '\0'  0 '\0'  0 '\0'
         0x2aaab01c7000: 0 '\0'  0 '\0'  0 '\0'  0 '\0'  0 '\0'  0 '\0'  0 '\0'  0 '\0'
         0x2aaab01c7008: 85 'U'  85 'U'  85 'U'  85 'U'  85 'U'  85 'U'  85 'U'  85 'U'
         0x2aaab01c7010: 0 '\0'  0 '\0'  0 '\0'  0 '\0'  85 'U'  85 'U'  85 'U'  85 'U'
         0x2aaab01c7018: 85 'U'  85 'U'  85 'U'  85 'U'  85 'U'  85 'U'  85 'U'  85 'U'
         0x2aaab01c7020: 85 'U'  85 'U'  85 'U'  85 'U'  85 'U'  85 'U'  85 'U'  85 'U'
         0x2aaab01c7028: 85 'U'  85 'U'  85 'U'  85 'U'  85 'U'  85 'U'  85 'U'  85 'U'
         0x2aaab01c7030: 85 'U'  85 'U'  85 'U'  85 'U'  85 'U'  85 'U'  85 'U'  85 'U'
         0x2aaab01c7038: -48 'Ð' -22 'ê' 126 '~' 2 '\002'        0 '\0'  0 '\0'  0 '\0'  0 '\0'
         0x2aaab01c7040: 112 'p' 90 'Z'  28 '\034'       -80 '°' -86 'ª' 42 '*'  0 '\0'  0 '\0'
         0x2aaab01c7048: 28 '\034'       0 '\0'  0 '\0'  0 '\0'  85 'U'  85 'U'  85 'U'  85 'U'
         0x2aaab01c7050: 85 'U'  85 'U'  85 'U'  85 'U'  85 'U'  85 'U'  85 'U'  85 'U'
         0x2aaab01c7058: 85 'U'  85 'U'  85 'U'  85 'U'  85 'U'  85 'U'  85 'U'  85 'U'
         0x2aaab01c7060: 64 '@'  0 '\0'  0 '\0'  0 '\0'  85 'U'  85 'U'  85 'U'  85 'U'
         0x2aaab01c7068: 85 'U'  85 'U'  85 'U'  85 'U'  85 'U'  85 'U'  85 'U'  85 'U'
         0x2aaab01c7070: 85 'U'  85 'U'  85 'U'  85 'U'  85 'U'  85 'U'  85 'U'  85 'U'
         0x2aaab01c7078: 85 'U'  85 'U'  85 'U'  85 'U'  85 'U'  85 'U'  85 'U'  85 'U'
         0x2aaab01c7080: 85 'U'  85 'U'  85 'U'  85 'U'  85 'U'  85 'U'  85 'U'  85 'U'

      the 0x55's in there result from this code, executed earlier:

        if (text) memset (SvPVX(text),0x55,SvLEN(text));//D
        if (text) SvREFCNT_dec (text);

      the second line causes the memory filled with 0x55 to be freed.

      Note that the 0x55's start near a 4k boundary.

   2. mallopt (M_PERTURB, <nonzero>) makes the program work
   3. NOT using mlockall (MCL_CURENT | MCL_FUTURE) makes the program work
   4. using valgrind makes the program work
   5. using dmalloc makes the program work

   So this problem only happens with the glibc malloc, when mlockall is
   active and the perturb-debugging-code is NOT active. I will show why these
   conditions are neccessary.

How this likely happens:

   From looking throught he glibc sourcecode, I can see that calloc
   sometimes does not clear the memory block, or only clears part of it,
   as an optimisation:

        /* Two optional cases in which clearing not necessary */
      #if HAVE_MMAP
        if (chunk_is_mmapped (p))
          {
            if (__builtin_expect (perturb_byte, 0))
              MALLOC_ZERO (mem, sz);
            return mem;
          }
      #endif

        csz = chunksize(p);

      #if MORECORE_CLEARS
        if (perturb_byte == 0 && (p == oldtop && csz > oldtopsize)) {
          /* clear only the bytes from non-freshly-sbrked memory */
          csz = oldtopsize;
        }
      #endif

   The memory block above is not an mmapped chunk (the word before it
   in memory is "0xb5" which means its not from brk-managed memory, has
   a size of 0xb0 bytes, has no valid prevous size prefix and is not an
   mmapp chunk).

   However, the second part checks for the case when an allocation
   has been extended which happens when there was a call to sbrk,
   extending the heap, or, for mmap-managed heaps, when there was a
   call to mprotect. In both cases, calloc will only clear up to the
   newly-allocated segment.

   This is apparently the condition that gets triggered, and here is how:

   Again, from reading the sources, it seems that glibc has the ability
   to manage multiple heap arenas, one with brk/sbrk, and multiple
   ones with mmap(PROT_NONE) which get "physically allocated" with
   mprotect(PROT_READ|PROT_WRITE) and "physically freed" with madvise
   (MADV_DONT_NEED).

   In an strace (intermingled with debugging output), I see this:

      mprotect(0x2aaab0135000, 155648, PROT_READ|PROT_WRITE) = 0
         (a) 0x2aaab010ec00 [0x2aaab0134d60 0x2aaab015aeb6]
      madvise(0x2aaab012f000, 180224, 0x4 /* MADV_??? */) = -1 EINVAL (Invalid argument)
         (b) 0x2aaab013afc0 0x5555555555555555 (0 135)

   Explanation: 

   The first mprotect "allocates" the memory used for the "text"
   above (the piece of memory that later gets memset to 0x55).

   The line (a) is debugging output from my program showing that
   [0x2aaab0134d60..0x2aaab015aeb6] was allocated.

   It is subsequently memset to 0x55 and then freed, resulting in the
   madvise (from malloc/arena.c), where glibc tries to get rid of the
   memory. The expectation from madvise is that the memory is cleared to
   zero by the kernel. Note how the madvise call (0x4 == MADV_DONTNEED
   btw.) fails, and also note that glibc completely ignores errors from
   madvise (see malloc/arena.c).

   In line (b) we see the address returned by calloc, and a pointer
   inside the calloc'ed memory areas, which should be 0, but isn't. This
   is because glibc thinks madvise cleared the memory, and the calloc
   optimisation kicks in where glibc assumes that the memory is now zero,
   when in fact it isn't cleared at all.

   EINVAL from madvise is documented as:

      EINVAL The value len is negative, start is not page-aligned, advice
      is not a valid value, or the application is attempting to release
      locked or shared pages (with MADV_DONTNEED).

   which explains why it fails only when mlockall is being used.

Result:

   mlockall is incompatible with the glibc memory allocator. this should
   either be fixed or clearly documented (preferably fixed, as most
   programs using mlockall are rather mission-critical, which is why they
   use mlockall in the first place :)

Again, my test program is rather big, and I didn't instrument my glibc, so
the above could also be wrong, which is why a glibc expert needs to look
at it. In any case, I think the problem is relatively obvious, and not
checking the madvise return code was a bad thing in the first place.

(as a related note, I think this could also explain some of the memleaks
I experience where mallinfo shows much _less_ memory used than ps
(i.e. 400mb vs. 1.5gb), which isn't explainable by mere internal
fragmentation. this would fit into the above, as glibc might assume
the additional memory has been madvised into oblivion when the kernel
disagrees).

-- System Information:
Debian Release: 4.0
  APT prefers unstable
  APT policy: (500, 'unstable'), (500, 'testing'), (500, 'stable'), (1, 'experimental')
Architecture: amd64 (x86_64)

Kernel: Linux 2.6.23-1-amd64 (SMP w/4 CPU cores)
Locale: LANG=C, LC_CTYPE=de_DE.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/bash

Versions of packages libc6 depends on:
ii  libgcc1                       1:4.3.0-1  GCC support library

libc6 recommends no packages.

-- debconf information:
  glibc/restart-services:
  glibc/restart-failed:



--- End Message ---
--- Begin Message ---
Version: 2.9-1

On Sat, Feb 20, 2010 at 11:45:08PM +0000, Adrien Kunysz wrote:
> Upstream bug: http://sources.redhat.com/bugzilla/show_bug.cgi?id=6958
> This is the commit that was used to fix Red Hat bug 405781:
> http://sourceware.org/git/?p=glibc.git;a=commitdiff;h=4cd4c5d6a28c4fbdc86651c4578f4c4f24efce08
> 
> Using your test case, I confirm I can reproduce the issue with glibc 2.7-18lenny2 x86_64.
> With your test case, I cannot reproduce the issue with Red Hat Enterprise Linux
> glibc-2.5-24.x86_64 which includes the above patch:
> 
> # ./debbug473812 
> test the calloc bug..
> test , n=1000
> test , n=1000
> test , n=1000
> test , n=1000
> test , n=1000
> test , n=1000
> test , n=1000
> test , n=1000
> test , n=1000
> test , n=1000
> Memory locked
> Memory locked
> Memory locked
> Memory locked
> Memory locked
> test finished
> allocated 32502000 bytes
> 
> This suggest the above patch indeeds fixes this issue although upstream bug
> is still open and madvise() return value is still not checked.

The patch has been included in upstream version 2.8, and appeared in
Debian with version 2.9-1. I can confirm by running the testcase given
in the bug log that the bug is fixed. I am therefore closing this bug
with the corresponding version.

-- 
Aurelien Jarno                          GPG: 4096R/1DDD8C9B
aurelien@aurel32.net                 http://www.aurel32.net

--- End Message ---

Reply to: