Re: Strange behavior with glibc 2.3.1, malloc and threads in Sid

To: debian-devel@lists.debian.org
Subject: Re: Strange behavior with glibc 2.3.1, malloc and threads in Sid
From: Ulrich Eckhardt <uli@doommachine.dyndns.org>
Date: Fri, 24 Jan 2003 18:45:42 +0100
Message-id: <[🔎] E18c7te-0001qm-00@doommachine.dyndns.org>
Reply-to: doomster@knuut.de
In-reply-to: <[🔎] 20030124001010.GE1501@thanes.org>
References: <[🔎] 20030124001010.GE1501@thanes.org>

On Friday 24 January 2003 01:10, Marek Habersack wrote:
> Hello all,
>
>   I might be doing something wrong, but I'm getting a really strange
> behavior in a program I'm writing. Here's the scenario: I've got a daemon
> program which creates one thread. The main daemon process is listening on a
> unix socket, getting data from the client, creating a request (by
> malloc'ing storage for a structure) and putting it in the circular buffer.
> The circular buffer and all variables related to it are protected by a
> mutex. The thread created at the startup blocks waiting for a condition to
> be signalled which is done by the main program right after it puts a new
> request in the queue. When that happens, the thread wakes up, gets the
> request from the queue and unlocks the mutex that protects it. So far, so
> good - but the problem is that the data gotten from the queue in the thread
> is partially corrupted: the circular buffer address is correct, the address
> of the malloc'ed request structure is correct, but the data stored in the
> structure is apparently random. The interesting thing is that the actual
> data of the request stored in the buffer is _not_ corrupted. Here's a
> fragment of a debug log from the daemon:
>
> [prog] Inserting req == 0x804f258; nto == 2; to == 0x804f300
> [prog] Putting into queue 0x804d0a0 slot 0
> [prog] Inserted req == 0x804f258; nto == 2; to == 0x804f300
>
> at this point the condition is broadcast and the thread wakes up:
>
> [thread] Checking req == 0x804f258; nto == 2; to == (nil)
>   note that the address of the request is correct but the data is
> corrupted.
>
> [thread] Getting from queue 0x804d0a0 slot 0
>   queue address is also correct
>
> [thread] Retrieved req == 0x804f258; nto == 2; to == 0x30613064
>   random data appears in the req storage
>
> [thread] Got req == 0x804f258; nto == 1869881403; to == 0x203b3835
>   even more randomness
>
> the above output from the thread is done with the lock held, there is no
> way the data could be modified in the meantime. Now we're back in the main
> program after pthread_cond_broadcast returns and the mutex is unlocked:
>
> [prog] Checking(2) req == 0x804f258; nto == 2; to == 0x804f300
>   And the data is correct again.
>
> The code is not complex and I'm positive that there is no race condition
> when accessing the data. The variables and routines that operate on the
> queue (with the mutex lock held) are as follows:
>
> ----- CUT -----
> static pthread_mutex_t   reqq_mutex = PTHREAD_MUTEX_INITIALIZER;
> static pthread_cond_t    reqq_cond = PTHREAD_COND_INITIALIZER;
> static unsigned long     reqq_size = 0;
> static unsigned long     reqq_write_idx = 0;
> static unsigned long     reqq_read_idx = 0;
> static vda_request     **reqq = NULL;
>
> inline static int rq_empty()
> {
>   return reqq_write_idx == reqq_read_idx;
> }
>
> inline static int rq_full()
> {
>   return ((reqq_write_idx + 1) % reqq_size) == reqq_read_idx;
> }
>
> vda_request *rq_get()
> {
>   vda_request   *ret;
>
>   if (rq_empty())
>     return NULL;
>
>   logmsg("Getting from queue %p slot %d", reqq, reqq_read_idx);
>   ret = reqq[reqq_read_idx++];
>   reqq_read_idx %= reqq_size;
>   logmsg("Retrieved req == %p; nto == %d; to == %p", ret, ret->nto,
>            ret->to);


>
>   logmsg("Putting into queue %p slot %d", reqq, reqq_write_idx);
>
>   reqq[reqq_write_idx++] = req;
>   reqq_write_idx %= reqq_size;
>   logmsg("Inserted req == %p; nto == %d; to == %p",
>          reqq[reqq_read_idx], reqq[reqq_read_idx]->nto,
>          reqq[reqq_read_idx]->to);
>
Not sure if that is the problem, but you are incrementing the index and thus 
the second logmsg()-call is logging the content of a free slot in your queue.
: )

> Is my understanding that the
> malloc'ed memory area can be shared between threads in a process correct?
Yes, threads share a common address-space (at least here they do and in every 
system I have come across), the consumer-producer design is typical in 
threaded apps and relies on that.

Reply to:

Follow-Ups:
- Re: Strange behavior with glibc 2.3.1, malloc and threads in Sid
  - From: Marek Habersack <grendel@debian.org>

References:
- Strange behavior with glibc 2.3.1, malloc and threads in Sid
  - From: Marek Habersack <grendel@debian.org>

Prev by Date: Re: Some myths regarding apt pinning
Next by Date: Re: Strange behavior with glibc 2.3.1, malloc and threads in Sid
Previous by thread: Re: Strange behavior with glibc 2.3.1, malloc and threads in Sid
Next by thread: Re: Strange behavior with glibc 2.3.1, malloc and threads in Sid
Index(es):
- Date
- Thread