[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#593679: OCFS2: 2 node cluster, kernel BUG at fs/ocfs2/dlm/dlmthread.c:169!



Package: linux-image-2.6.32-5-amd64
Version: 2.6.32-18
Severity: important
Tags: patch

I'm running a 2 node cluster that mounts an OCFS2 filesystem on both
nodes. The disk containing the filesystem is an iScsi volume hosted
on a SAN device. During simultaneous use of the filesystem by both
nodes I reproducibly encountered the following BUG():

kernel: [3401206.397280] lockres: O00000000000000000e69950000000,
owner=1, state=0
kernel: [3401206.397280]   last used: 5139177996, refcnt: 5, on purge list: yes
kernel: [3401206.397280]   on dirty list: no, on reco list: no,
migrating pending: no
kernel: [3401206.397280]   inflight locks: 0, asts reserved: 1
kernel: [3401206.397280]   refmap nodes: [ ], inflight=0
kernel: [3401206.397280]   granted queue:
kernel: [3401206.397280]     type=3, conv=-1, node=1,
cookie=1:275866887, ref=3, ast=(empty=n,pend=y),
bast=(empty=y,pend=n), pending=(conv=n,lock=n,cancel=n,unlock=n)
kernel: [3401206.397280]   converting queue:
kernel: [3401206.397280]   blocked queue:
kernel: [3401206.397280] ------------[ cut here ]------------
kernel: [3401206.397280] kernel BUG at fs/ocfs2/dlm/dlmthread.c:169!
kernel: [3401206.397280] invalid opcode: 0000 [1] SMP
kernel: [3401206.397280] CPU 1
kernel: [3401206.397280] Modules linked in: ocfs2 ocfs2_dlmfs
ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager ocfs2_stackglue configfs
sd_mod crc32c libcrc32c ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad
ib_core ib_addr iscsi_tcp libiscsi scsi_transport_iscsi scsi_mod evdev
ext3 jbd mbcache dm_mirror dm_log dm_snapshot dm_mod thermal_sys
kernel: [3401206.397280] Pid: 26270, comm: dlm_thread Not tainted
2.6.26-2-xen-amd64 #1
kernel: [3401206.397280] RIP: e030:[<ffffffffa0146bd3>]
[<ffffffffa0146bd3>] :ocfs2_dlm:dlm_run_purge_list+0x148/0x578
kernel: [3401206.397280] RSP: e02b:ffff88000dfafe00  EFLAGS: 00010246
kernel: [3401206.397280] RAX: ffff880007c18700 RBX: ffff880007c18768
RCX: 0000c6c600005794
kernel: [3401206.397280] RDX: 000000000000eeee RSI: 0000000000000001
RDI: ffffffff8059dab0
kernel: [3401206.397280] RBP: ffff880007c186c0 R08: 0000000000000000
R09: 0000000000000001
kernel: [3401206.397280] R10: 0000000000000023 R11: 0000010000000022
R12: 000000013251a20c
kernel: [3401206.397280] R13: ffff88002492f400 R14: ffff88002492f428
R15: 0000000000000001
kernel: [3401206.397280] FS:  00007ffff7ee4750(0000)
GS:ffffffff8052d080(0000) knlGS:0000000000000000
kernel: [3401206.397280] CS:  e033 DS: 0000 ES: 0000
kernel: [3401206.397280] DR0: 0000000000000000 DR1: 0000000000000000
DR2: 0000000000000000
kernel: [3401206.397280] DR3: 0000000000000000 DR6: 00000000ffff0ff0
DR7: 0000000000000400
kernel: [3401206.397280] Process dlm_thread (pid: 26270, threadinfo
ffff88000dfae000, task ffff8800010a3900)
kernel: [3401206.397280] Stack:  000000013251abd0 ffffffff802358d5
ffff8800010a3900 000000002492f400
kernel: [3401206.397280]  ffff88002492f46c 00000000000000e8
ffff88002f171080 ffff88002492f400
kernel: [3401206.397280]  ffff88002f171128 ffff88000520a338
0000000000000001 ffffffffa014730f
kernel: [3401206.397280] Call Trace:
kernel: [3401206.397280]  [<ffffffff802358d5>] ? process_timeout+0x0/0x5
kernel: [3401206.397280]  [<ffffffffa014730f>] ?
:ocfs2_dlm:dlm_thread+0x95/0xe82
kernel: [3401206.397280]  [<ffffffff80224d35>] ? try_to_wake_up+0x118/0x129
kernel: [3401206.397289]  [<ffffffff8023f671>] ?
autoremove_wake_function+0x0/0x2e
kernel: [3401206.397289]  [<ffffffffa014727a>] ? :ocfs2_dlm:dlm_thread+0x0/0xe82
kernel: [3401206.397289]  [<ffffffff8023f543>] ? kthread+0x47/0x74
kernel: [3401206.397289]  [<ffffffff802283a8>] ? schedule_tail+0x27/0x5c
kernel: [3401206.397289]  [<ffffffff8020be28>] ? child_rip+0xa/0x12
kernel: [3401206.397289]  [<ffffffff8023f4fc>] ? kthread+0x0/0x74
kernel: [3401206.397289]  [<ffffffff8020be1e>] ? child_rip+0x0/0x12
kernel: [3401206.397289]
kernel: [3401206.397289]
kernel: [3401206.397289] Code: d2 89 04 24 31 c0 e8 c3 6a 0e e0 48 89
ef e8 bd ed ff ff fe 03 0f b7 13 38 f2 0f 95 c0 84 c0 74 0a 89 d6 48
89 df e8 2d 30 23 e0 <0f> 0b eb fe 66 8b 95 ca 00 00 00 f6 c2 20 0f 84
b1 00 00 00 48
kernel: [3401206.397289] RIP  [<ffffffffa0146bd3>]
:ocfs2_dlm:dlm_run_purge_list+0x148/0x578
kernel: [3401206.397289]  RSP <ffff88000dfafe00>
kernel: [3401206.397294] ---[ end trace 285cd07f988b3d3e ]---

The code that causes the crash is:

fs/ocfs2/dlm/dlmthread.c:
static int dlm_purge_lockres(struct dlm_ctxt *dlm,
                            struct dlm_lock_resource *res)
{
       int master;
       int ret = 0;

       spin_lock(&res->spinlock);
       if (!__dlm_lockres_unused(res)) {
               mlog(0, "%s:%.*s: tried to purge but not unused\n",
                    dlm->name, res->lockname.len, res->lockname.name);
               __dlm_print_one_lock_resource(res);
               spin_unlock(&res->spinlock);
--->           BUG();
       }

Searching for 'tried to purge but not unused' I found this patch (also
attached):

http://www.mail-archive.com/ocfs2-devel@oss.oracle.com/msg06018.html

It removes the BUG() statement and fixes a race that causes this
crash. After applying this patch
to both systems I could no longer reproduce the problem. The fix will
also be included in mainline,
probably 2.6.36. Please include this fix in the debian 'squeeze' kernel.

Thanks,
Ronald.
This patch fixes two problems in dlm_run_purgelist

1. If a lockres is found to be in use, dlm_run_purgelist keeps trying to purge
the same lockres instead of trying the next lockres.

2. When a lockres is found unused, dlm_run_purgelist releases lockres spinlock
before setting DLM_LOCK_RES_DROPPING_REF and calls dlm_purge_lockres.
spinlock is reacquired but in this window lockres can get reused. This leads
to BUG.

This patch modifies dlm_run_purgelist to skip lockres if it's in use and purge
 next lockres. It also sets DLM_LOCK_RES_DROPPING_REF before releasing the
lockres spinlock protecting it from getting reused.

Signed-off-by: Srinivas Eeda <srinivas.e...@oracle.com>
Acked-by: Sunil Mushran <sunil.mush...@oracle.com>

---
 fs/ocfs2/dlm/dlmthread.c |   80 +++++++++++++++++++--------------------------
 1 files changed, 34 insertions(+), 46 deletions(-)

diff --git a/fs/ocfs2/dlm/dlmthread.c b/fs/ocfs2/dlm/dlmthread.c
index 11a6d1f..960dc8d 100644
--- a/fs/ocfs2/dlm/dlmthread.c
+++ b/fs/ocfs2/dlm/dlmthread.c
@@ -152,45 +152,25 @@ void dlm_lockres_calc_usage(struct dlm_ctxt *dlm,
        spin_unlock(&dlm->spinlock);
 }
 
-static int dlm_purge_lockres(struct dlm_ctxt *dlm,
+static void dlm_purge_lockres(struct dlm_ctxt *dlm,
                             struct dlm_lock_resource *res)
 {
        int master;
        int ret = 0;
 
-       spin_lock(&res->spinlock);
-       if (!__dlm_lockres_unused(res)) {
-               mlog(0, "%s:%.*s: tried to purge but not unused\n",
-                    dlm->name, res->lockname.len, res->lockname.name);
-               __dlm_print_one_lock_resource(res);
-               spin_unlock(&res->spinlock);
-               BUG();
-       }
-
-       if (res->state & DLM_LOCK_RES_MIGRATING) {
-               mlog(0, "%s:%.*s: Delay dropref as this lockres is "
-                    "being remastered\n", dlm->name, res->lockname.len,
-                    res->lockname.name);
-               /* Re-add the lockres to the end of the purge list */
-               if (!list_empty(&res->purge)) {
-                       list_del_init(&res->purge);
-                       list_add_tail(&res->purge, &dlm->purge_list);
-               }
-               spin_unlock(&res->spinlock);
-               return 0;
-       }
+       assert_spin_locked(&dlm->spinlock);
+       assert_spin_locked(&res->spinlock);
 
        master = (res->owner == dlm->node_num);
 
-       if (!master)
-               res->state |= DLM_LOCK_RES_DROPPING_REF;
-       spin_unlock(&res->spinlock);
 
        mlog(0, "purging lockres %.*s, master = %d\n", res->lockname.len,
             res->lockname.name, master);
 
        if (!master) {
+               res->state |= DLM_LOCK_RES_DROPPING_REF;
                /* drop spinlock...  retake below */
+               spin_unlock(&res->spinlock);
                spin_unlock(&dlm->spinlock);
 
                spin_lock(&res->spinlock);
@@ -208,31 +188,35 @@ static int dlm_purge_lockres(struct dlm_ctxt *dlm,
                mlog(0, "%s:%.*s: dlm_deref_lockres returned %d\n",
                     dlm->name, res->lockname.len, res->lockname.name, ret);
                spin_lock(&dlm->spinlock);
+               spin_lock(&res->spinlock);
        }
 
-       spin_lock(&res->spinlock);
        if (!list_empty(&res->purge)) {
                mlog(0, "removing lockres %.*s:%p from purgelist, "
                     "master = %d\n", res->lockname.len, res->lockname.name,
                     res, master);
                list_del_init(&res->purge);
-               spin_unlock(&res->spinlock);
                dlm_lockres_put(res);
                dlm->purge_count--;
-       } else
-               spin_unlock(&res->spinlock);
+       }
+
+       if (!__dlm_lockres_unused(res)) {
+               mlog(ML_ERROR, "found lockres %s:%.*s: in use after deref\n",
+                    dlm->name, res->lockname.len, res->lockname.name);
+               __dlm_print_one_lock_resource(res);
+               BUG();
+       }
 
        __dlm_unhash_lockres(res);
 
        /* lockres is not in the hash now.  drop the flag and wake up
         * any processes waiting in dlm_get_lock_resource. */
        if (!master) {
-               spin_lock(&res->spinlock);
                res->state &= ~DLM_LOCK_RES_DROPPING_REF;
                spin_unlock(&res->spinlock);
                wake_up(&res->wq);
-       }
-       return 0;
+       } else
+               spin_unlock(&res->spinlock);
 }
 
 static void dlm_run_purge_list(struct dlm_ctxt *dlm,
@@ -251,17 +235,7 @@ static void dlm_run_purge_list(struct dlm_ctxt *dlm,
                lockres = list_entry(dlm->purge_list.next,
                                     struct dlm_lock_resource, purge);
 
-               /* Status of the lockres *might* change so double
-                * check. If the lockres is unused, holding the dlm
-                * spinlock will prevent people from getting and more
-                * refs on it -- there's no need to keep the lockres
-                * spinlock. */
                spin_lock(&lockres->spinlock);
-               unused = __dlm_lockres_unused(lockres);
-               spin_unlock(&lockres->spinlock);
-
-               if (!unused)
-                       continue;
 
                purge_jiffies = lockres->last_used +
                        msecs_to_jiffies(DLM_PURGE_INTERVAL_MS);
@@ -273,15 +247,29 @@ static void dlm_run_purge_list(struct dlm_ctxt *dlm,
                         * in tail order, we can stop at the first
                         * unpurgable resource -- anyone added after
                         * him will have a greater last_used value */
+                       spin_unlock(&lockres->spinlock);
                        break;
                }
 
+               /* Status of the lockres *might* change so double
+                * check. If the lockres is unused, holding the dlm
+                * spinlock will prevent people from getting and more
+                * refs on it. */
+               unused = __dlm_lockres_unused(lockres);
+               if (!unused ||
+                   (lockres->state & DLM_LOCK_RES_MIGRATING)) {
+                       mlog(0, "lockres %s:%.*s: is in use or "
+                            "being remastered, used %d, state %d\n",
+                            dlm->name, lockres->lockname.len,
+                            lockres->lockname.name, !unused, lockres->state);
+                       list_move_tail(&dlm->purge_list, &lockres->purge);
+                       spin_unlock(&lockres->spinlock);
+                       continue;
+               }
+
                dlm_lockres_get(lockres);
 
-               /* This may drop and reacquire the dlm spinlock if it
-                * has to do migration. */
-               if (dlm_purge_lockres(dlm, lockres))
-                       BUG();
+               dlm_purge_lockres(dlm, lockres);
 
                dlm_lockres_put(lockres);
 
-- 
1.5.6.5


Reply to: