Strange behavior with glibc 2.3.1, malloc and threads in Sid

To: Debian Devel <debian-devel@lists.debian.org>
Subject: Strange behavior with glibc 2.3.1, malloc and threads in Sid
From: Marek Habersack <grendel@debian.org>
Date: Fri, 24 Jan 2003 01:10:10 +0100
Message-id: <[🔎] 20030124001010.GE1501@thanes.org>
Mail-followup-to: Debian Devel <debian-devel@lists.debian.org>
Reply-to: grendel@debian.org

Hello all,

  I might be doing something wrong, but I'm getting a really strange
behavior in a program I'm writing. Here's the scenario: I've got a daemon
program which creates one thread. The main daemon process is listening on a
unix socket, getting data from the client, creating a request (by malloc'ing
storage for a structure) and putting it in the circular buffer. The circular
buffer and all variables related to it are protected by a mutex. The thread
created at the startup blocks waiting for a condition to be signalled which
is done by the main program right after it puts a new request in the queue.
When that happens, the thread wakes up, gets the request from the queue and
unlocks the mutex that protects it. So far, so good - but the problem is
that the data gotten from the queue in the thread is partially corrupted:
the circular buffer address is correct, the address of the malloc'ed request
structure is correct, but the data stored in the structure is apparently
random. The interesting thing is that the actual data of the request stored
in the buffer is _not_ corrupted. Here's a fragment of a debug log from the
daemon:

[prog] Inserting req == 0x804f258; nto == 2; to == 0x804f300
[prog] Putting into queue 0x804d0a0 slot 0
[prog] Inserted req == 0x804f258; nto == 2; to == 0x804f300

at this point the condition is broadcast and the thread wakes up:

[thread] Checking req == 0x804f258; nto == 2; to == (nil)
  note that the address of the request is correct but the data is corrupted.

[thread] Getting from queue 0x804d0a0 slot 0
  queue address is also correct

[thread] Retrieved req == 0x804f258; nto == 2; to == 0x30613064
  random data appears in the req storage

[thread] Got req == 0x804f258; nto == 1869881403; to == 0x203b3835
  even more randomness

the above output from the thread is done with the lock held, there is no way
the data could be modified in the meantime. Now we're back in the main
program after pthread_cond_broadcast returns and the mutex is unlocked:

[prog] Checking(2) req == 0x804f258; nto == 2; to == 0x804f300
  And the data is correct again.

The code is not complex and I'm positive that there is no race condition
when accessing the data. The variables and routines that operate on the
queue (with the mutex lock held) are as follows:

----- CUT -----
static pthread_mutex_t   reqq_mutex = PTHREAD_MUTEX_INITIALIZER;
static pthread_cond_t    reqq_cond = PTHREAD_COND_INITIALIZER;
static unsigned long     reqq_size = 0;
static unsigned long     reqq_write_idx = 0;
static unsigned long     reqq_read_idx = 0;
static vda_request     **reqq = NULL;

inline static int rq_empty()
{
  return reqq_write_idx == reqq_read_idx;
}

inline static int rq_full()
{
  return ((reqq_write_idx + 1) % reqq_size) == reqq_read_idx;
}

vda_request *rq_get() 
{
  vda_request   *ret;
  
  if (rq_empty())
    return NULL;

  logmsg("Getting from queue %p slot %d", reqq, reqq_read_idx);
  ret = reqq[reqq_read_idx++];
  reqq_read_idx %= reqq_size;
  logmsg("Retrieved req == %p; nto == %d; to == %p", ret, ret->nto,
           ret->to);
  
  return ret;
}

int rq_insert(vda_request *req)
{
  if (rq_full())
    return 1;

  logmsg("Putting into queue %p slot %d", reqq, reqq_write_idx);
  
  reqq[reqq_write_idx++] = req;
  reqq_write_idx %= reqq_size;
  logmsg("Inserted req == %p; nto == %d; to == %p",
         reqq[reqq_read_idx], reqq[reqq_read_idx]->nto,
         reqq[reqq_read_idx]->to);
  
  return 0;
}
----- CUT -----

The glibc is 2.3.1-10 from Sid, unmodified. I was trying to compile one with
nptl and tls support to check whether the same would happen, but it seems
that nptl 0.17 doesn't really want to compile with the latest glibc from the
cvs, so I gave up on this for now. Is my understanding that the malloc'ed
memory area can be shared between threads in a process correct? That's what
I've always known and thought, but I might be wrong. Or is it some kind of
obscure bug in glibc 2.3.1?

TIA,

marek

Attachment: pgpzyRw4POgIK.pgp
Description: PGP signature

Reply to:

Follow-Ups:
- Re: Strange behavior with glibc 2.3.1, malloc and threads in Sid
  - From: "H. S. Teoh" <hsteoh@quickfur.ath.cx>
- Re: Strange behavior with glibc 2.3.1, malloc and threads in Sid
  - From: Ulrich Eckhardt <uli@doommachine.dyndns.org>

Prev by Date: Bug#178113: ITP: ol2mbox -- Outlook to unix mail converter
Next by Date: Re: C++ transition stumbling blocks?
Previous by thread: Bug#178113: ITP: ol2mbox -- Outlook to unix mail converter
Next by thread: Re: Strange behavior with glibc 2.3.1, malloc and threads in Sid
Index(es):
- Date
- Thread