[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#814980: Additional information regarding bug #814980



Hello,

I believe I am seeing this same bug (814980), but with Apache for Windows. I do not think this is necessarily a Windows specific bug. Here's the information I was able to dig up. It looks like one of Apache's memory allocators is getting stuck (apr-util/include/apr_misc/apr_rmm.c) in the while(next) loop below because, at least in the case I am observing, blk->next holds the value of next so the loop does not advance.

static apr_rmm_off_t find_block_of_size(apr_rmm_t *rmm, apr_size_t size)
{
    apr_rmm_off_t next = rmm->base->firstfree;
    apr_rmm_off_t best = 0;
    apr_rmm_off_t bestsize = 0;

    while (next) {
        struct rmm_block_t *blk = (rmm_block_t*)((char*)rmm->base + next);

        if (blk->size == size)
            return next;

        if (blk->size >= size) {
            /* XXX: sub optimal algorithm 
             * We need the most thorough best-fit logic, since we can
             * never grow our rmm, we are SOL when we hit the wall.
             */
            if (!bestsize || (blk->size < bestsize)) {
                bestsize = blk->size;
                best = next;
            }
        }

        next = blk->next;
    }

    if (bestsize > RMM_BLOCK_SIZE + size) {
        struct rmm_block_t *blk = (rmm_block_t*)((char*)rmm->base + best);
        struct rmm_block_t *new = (rmm_block_t*)((char*)rmm->base + best + size);

        new->size = blk->size - size;
        new->next = blk->next;
        new->prev = best;

        blk->size = size;
        blk->next = best + size;

        if (new->next) {
            blk = (rmm_block_t*)((char*)rmm->base + new->next);
            blk->prev = best + size;
        }
    }

    return best;
}

The debugger shows a number of threads are all in this same function at the same time with the same data. 

This function is, in theory, guarded by a lock. However, the lock type is a union of multiple kinds of lock (ie: cross process, mutex, read/write, or a null lock type) (apr-util/include/apr_anylock.h):

/** Structure that may contain any APR lock type */
typedef struct apr_anylock_t {
    /** Indicates what type of lock is in lock */
    enum tm_lock {
        apr_anylock_none,           /**< None */
        apr_anylock_procmutex,      /**< Process-based */
        apr_anylock_threadmutex,    /**< Thread-based */
        apr_anylock_readlock,       /**< Read lock */
        apr_anylock_writelock       /**< Write lock */
    } type;
    /** Union of all possible APR locks */
    union apr_anylock_u_t {
        apr_proc_mutex_t *pm;       /**< Process mutex */
#if APR_HAS_THREADS
        apr_thread_mutex_t *tm;     /**< Thread mutex */
        apr_thread_rwlock_t *rw;    /**< Read-write lock */
#endif
    } lock;
} apr_anylock_t;

Looking at the lock object's innards in the debugger it seems like the lock it's using is the null type, which makes sense because the LDAP cache code doesn't pass in a lock: (httpd/modules/ldap/util_ldap_cache.c):

        /* This will create a rmm "handler" to get into the shared memory area */
        result = apr_rmm_init(&st->cache_rmm, NULL,
                              apr_shm_baseaddr_get(st->cache_shm), size,
                              st->pool);

and if one isn't passed in, it initializes it to a null lock:

APU_DECLARE(apr_status_t) apr_rmm_init(apr_rmm_t **rmm, apr_anylock_t *lock, 
                                       void *base, apr_size_t size,
                                       apr_pool_t *p)
{
    apr_status_t rv;
    rmm_block_t *blk;
    apr_anylock_t nulllock;
    
    if (!lock) {
        nulllock.type = apr_anylock_none;
        nulllock.lock.pm = NULL;
        lock = &nulllock;
    }

I would think the apr_rmm_init() call in the LDAP cache should pass in a lock, or avoid using the apr_rmm memory system.

-Max


Reply to: