[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#610530: marked as done (ocfs2-tools: BUG at fs/ocfs2/dlm/dlmmaster.c:2226! invalid opcode)



Your message dated Mon, 12 Aug 2013 17:06:52 +0200
with message-id <20130812150652.GA10215@inutil.org>
and subject line Re: ocfs2-tools: BUG at fs/ocfs2/dlm/dlmmaster.c:2226! invalid opcode
has caused the Debian Bug report #610530,
regarding ocfs2-tools: BUG at fs/ocfs2/dlm/dlmmaster.c:2226! invalid opcode
to be marked as done.

This means that you claim that the problem has been dealt with.
If this is not the case it is now your responsibility to reopen the
Bug report if necessary, and/or fix the problem forthwith.

(NB: If you are a system administrator and have no idea what this
message is talking about, this may indicate a serious mail system
misconfiguration somewhere. Please contact owner@bugs.debian.org
immediately.)


-- 
610530: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=610530
Debian Bug Tracking System
Contact owner@bugs.debian.org with problems
--- Begin Message ---

Package: ocfs2-tools
Version: 1.4.1-1
Severity: critical
Justification: causes high load, reboot on both nodes (2-node cluster)


Log details on node1:

--
Jan 19 09:49:33 kernel: [5665461.514795] (6894,14):dlm_drop_lockres_ref:2224 ERROR: while dropping ref on A35DE40B6A044A4A873B96E2F2DE42B2:M000000000000000112401200000000 (maste
r=0) got -22.
Jan 19 09:49:33 kernel: [5665461.805602] lockres: M00000000000000011240120000000, owner=0, state=64
Jan 19 09:49:33 kernel: [5665461.932077]   last used: 5332038594, refcnt: 3, on purge list: yes
Jan 19 09:49:33 kernel: [5665462.148475]   on dirty list: no, on reco list: no, migrating pending: no
Jan 19 09:49:33 kernel: [5665462.274649]   inflight locks: 0, asts reserved: 0
Jan 19 09:49:33 kernel: [5665462.274649]   refmap nodes: [ ], inflight=0
Jan 19 09:49:33 kernel: [5665462.274649]   granted queue:
Jan 19 09:49:33 kernel: [5665462.274649]   converting queue:
Jan 19 09:49:33 kernel: [5665462.274649]   blocked queue:
Jan 19 09:49:33 kernel: [5665462.274649] ------------[ cut here ]------------
Jan 19 09:49:33 kernel: [5665462.274649] kernel BUG at fs/ocfs2/dlm/dlmmaster.c:2226!
Jan 19 09:49:33 kernel: [5665462.274649] invalid opcode: 0000 [1] SMP Jan 19 09:49:33 kernel: [5665462.274649] CPU 14 Jan 19 09:49:33 kernel: [5665462.274649] Modules linked in: nls_utf8 cifs nls_base ip_vs_rr xt_connlimit nfs ocfs2 ip_vs ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager ocfs2_stackglue configfs nfsd lockd nfs_acl auth_rpcgss sunrpc exportfs ipt_LOG xt_limit nf_conntrack_ipv4 xt_state nf_conntrack xt_tcpudp iptable_filter ip_tables x_tables dm_rdac qla2xxx bnx2 firmware_class usbhid uhci_hcd thermal sr_mod snd_pcm snd_timer snd_page_alloc snd soundcore shpchp sg sd_mod scsi_transport_fc scsi_tgt processor pcspkr pci_hotplug meg
araid_sas loop ipv6 ide_pci_generic ide_core i2c_i801 i2c_core hid ff_memless fan thermal_sys ext3 jbd mbcache evdev ehci_hcd dm_round_robin dm_multipath dm_mod cdrom cdc_ether usbne
t mii button ata_piix ata_generic libata scsi_mod dock
Jan 19 09:49:33 kernel: [5665462.274649] Pid: 6894, comm: dlm_thread Not tainted 2.6.26-2-amd64 #1
Jan 19 09:49:33 kernel: [5665462.274649] RIP: 0010:[<ffffffffa038c381>]  [<ffffffffa038c381>] :ocfs2_dlm:dlm_drop_lockres_ref+0x1dd/0x1f0
Jan 19 09:49:33 kernel: [5665462.274649] RSP: 0018:ffff810875ceddd0  EFLAGS: 00010202
Jan 19 09:49:33 kernel: [5665462.274649] RAX: ffff8105364e8888 RBX: 0000000000000000 RCX: 00000000031a9f89
Jan 19 09:49:33 kernel: [5665462.274649] RDX: 0000000000000000 RSI: 0000000000000034 RDI: 0000000000000282
Jan 19 09:49:33 kernel: [5665462.274649] RBP: 000000000000001f R08: 0000000000000000 R09: ffff810875ced900
Jan 19 09:49:33 kernel: [5665462.274649] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8105364e8840
Jan 19 09:49:33 kernel: [5665462.274649] R13: ffff81086ddd7800 R14: ffff81070616bb80 R15: 00000000000000b5
Jan 19 09:49:33 kernel: [5665462.274649] FS:  0000000000000000(0000) GS:ffff81107cf981c0(0000) knlGS:0000000000000000
Jan 19 09:49:33 kernel: [5665462.274649] CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
Jan 19 09:49:33 kernel: [5665462.274649] CR2: 0000000002694000 CR3: 0000000000201000 CR4: 00000000000006e0
Jan 19 09:49:33 kernel: [5665462.274649] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Jan 19 09:49:33 kernel: [5665462.274649] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Jan 19 09:49:33 kernel: [5665462.274649] Process dlm_thread (pid: 6894, threadinfo ffff810875cec000, task ffff81086e521770)
Jan 19 09:49:33 kernel: [5665462.274649] Stack:  000000000000001f ffff81070616bb80 ffff810500000000 00000000ffffffea
Jan 19 09:49:33 kernel: [5665462.274649]  1f01000000000000 303030303030304d 3030303030303030 3032313034323131
Jan 19 09:49:33 kernel: [5665462.274649]  0030303030303030 0000000000000000 0000000000000000 0000000000000000
Jan 19 09:49:33 kernel: [5665462.274649] Call Trace:
Jan 19 09:49:33 kernel: [5665462.274649]  [<ffffffffa0381860>] ? :ocfs2_dlm:dlm_thread+0x237/0x1107
Jan 19 09:49:33 kernel: [5665462.274649]  [<ffffffff802461a5>] ? autoremove_wake_function+0x0/0x2e
Jan 19 09:49:33 kernel: [5665462.274649]  [<ffffffffa0381629>] ? :ocfs2_dlm:dlm_thread+0x0/0x1107
Jan 19 09:49:33 kernel: [5665462.274649]  [<ffffffff8024607f>] ? kthread+0x47/0x74
Jan 19 09:49:33 kernel: [5665462.274649]  [<ffffffff802300ed>] ? schedule_tail+0x27/0x5c
Jan 19 09:49:33 kernel: [5665462.274649]  [<ffffffff8020cf38>] ? child_rip+0xa/0x12
Jan 19 09:49:33 kernel: [5665462.274649]  [<ffffffff8021a866>] ? lapic_next_event+0xf/0x13
Jan 19 09:49:33 kernel: [5665462.274649]  [<ffffffff80246038>] ? kthread+0x0/0x74
Jan 19 09:49:33 kernel: [5665462.274649]  [<ffffffff8020cf2e>] ? child_rip+0x0/0x12
Jan 19 09:49:33 kernel: [5665462.274649] Jan 19 09:49:33 kernel: [5665462.274649] Jan 19 09:49:33 kernel: [5665462.274649] Code: 8b 14 25 24 00 00 00 48 c7 c1 e0 89 39 a0 89 d2 4c 89 74 24 08 89 44 24 10 31 c0 89 2c 24 e8 2c 90 ea df 4c 89 e7 e8 32 43 ff ff <0f> 0b eb fe 48 83 c4 70 89 d8 5b 5d 41 5c 41 5d 41 5e c3 41 54 Jan 19 09:49:33 kernel: [5665462.274649] RIP [<ffffffffa038c381>] :ocfs2_dlm:dlm_drop_lockres_ref+0x1dd/0x1f0
Jan 19 09:49:33 kernel: [5665462.274649]  RSP <ffff810875ceddd0>
Jan 19 09:49:33 kernel: [5665462.422453] ---[ end trace ee1657d875d4e1f1 ]---
--

--
Jan 19 09:54:05 kernel: [5665830.740248] o2net: connection to node XXX (num 0) at x.x.x.x:xxxx has been idle for 30.0 seconds, shutting it down.
Jan 19 09:54:05 kernel: [5665831.043225] (0,12):o2net_idle_timer:1468 here are some times that might help debug the situation: (tmr 1295427215.500604 now 1295427245.497577 dr 12
95427215.497446 adv 1295427215.500636:1295427215.500637 func (8737b25e:500) 1295427215.500605:1295427215.500635)
Jan 19 09:54:05 kernel: [5665831.247064] o2net: no longer connected to node XXX (num 0) at x.x.x.x:xxxx
Jan 19 09:54:25 kernel: [5665831.427022] (22635,0):dlm_do_master_request:1342 ERROR: link to 0 went down!
Jan 19 09:54:25 kernel: [5665831.427378] (6482,12):dlm_do_master_request:1342 ERROR: link to 0 went down!
Jan 19 09:54:25 kernel: [5665831.423470] (4102,7):dlm_do_master_request:1342 ERROR: link to 0 went down!
Jan 19 09:54:25 kernel: [5665831.427378] (21432,1):dlm_send_remote_unlock_request:359 ERROR: status = -112
Jan 19 09:54:25 kernel: [5665831.423471] (5686,4):dlm_do_master_request:1342 ERROR: link to 0 went down!
Jan 19 09:54:25 kernel: [5665831.423470] (7690,8):dlm_do_master_request:1342 ERROR: link to 0 went down!
Jan 19 09:54:25 kernel: [5665831.427378] (21552,15):dlm_do_master_request:1342 ERROR: link to 0 went down!
Jan 19 09:54:25 kernel: [5665831.427378] (6810,14):dlm_do_master_request:1342 ERROR: link to 0 went down!
Jan 19 09:54:25 kernel: [5665831.427378] (6910,9):dlm_drop_lockres_ref:2219 ERROR: status = -112
Jan 19 09:54:25 kernel: [5665831.427378] (7049,11):dlm_drop_lockres_ref:2219 ERROR: status = -112
Jan 19 09:54:25 kernel: [5665831.651713] (8202,3):dlm_do_master_request:1342 ERROR: link to 0 went down!
Jan 19 09:54:25 kernel: [5665831.427022] (7005,10):dlm_drop_lockres_ref:2219 ERROR: status = -112
Jan 19 09:54:25 kernel: [5665831.651713] (7932,2):dlm_do_master_request:1342 ERROR: link to 0 went down!
Jan 19 09:54:25 kernel: [5665831.651713] (7770,13):dlm_do_master_request:1342 ERROR: link to 0 went down!
Jan 19 09:54:25 kernel: [5665831.423470] (6159,5):dlm_do_master_request:1342 ERROR: link to 0 went down!
Jan 19 09:54:25 kernel: [5665831.427378] (6482,12):dlm_get_lock_resource:919 ERROR: status = -112
Jan 19 09:54:25 kernel: [5665831.423470] (4102,7):dlm_get_lock_resource:919 ERROR: status = -112
Jan 19 09:54:25 kernel: [5665831.423471] (5686,4):dlm_get_lock_resource:919 ERROR: status = -112
Jan 19 09:54:25 kernel: [5665831.423470] (7690,8):dlm_get_lock_resource:919 ERROR: status = -112
Jan 19 09:54:25 kernel: [5665831.427378] (21552,15):dlm_get_lock_resource:919 ERROR: status = -112
Jan 19 09:54:25 kernel: [5665831.427378] (7295,1):dlm_drop_lockres_ref:2219 ERROR: status = -112
Jan 19 09:54:25 kernel: [5665831.651713] (24522,6):dlm_do_master_request:1342 ERROR: link to 0 went down!
Jan 19 09:54:25 kernel: [5665831.427378] (6810,14):dlm_get_lock_resource:919 ERROR: status = -112
Jan 19 09:54:25 kernel: [5665831.427432] (6910,9):dlm_purge_lockres:190 ERROR: status = -112
Jan 19 09:54:25 kernel: [5665831.427378] (7049,11):dlm_purge_lockres:190 ERROR: status = -112
Jan 19 09:54:25 kernel: [5665831.651713] (8202,3):dlm_get_lock_resource:919 ERROR: status = -112
Jan 19 09:54:25 kernel: [5665831.427022] (7005,10):dlm_purge_lockres:190 ERROR: status = -112
Jan 19 09:54:25 kernel: [5665831.651713] (7932,2):dlm_get_lock_resource:919 ERROR: status = -112
Jan 19 09:54:25 kernel: [5665831.651713] (7770,13):dlm_get_lock_resource:919 ERROR: status = -112
Jan 19 09:54:25 kernel: [5665831.423470] (6159,5):dlm_get_lock_resource:919 ERROR: status = -112
Jan 19 09:54:25 kernel: [5665831.427378] (4172,12):dlm_do_master_request:1342 ERROR: link to 0 went down!
--


Problem reflections on the node2:

--
Jan 19 09:54:36 kernel: [5414618.046127] (20385,2):dlm_send_remote_unlock_request:359 ERROR: status = -107 Jan 19 09:54:36 kernel: [5414618.046133] (20385,2):dlm_send_remote_unlock_request:359 ERROR: status = -107 Jan 19 09:54:36 kernel: [5414618.046138] (20385,2):dlm_send_remote_unlock_request:359 ERROR: status = -107 Jan 19 09:54:36 kernel: [5414618.046165] (20385,2):dlm_send_remote_unlock_request:359 ERROR: status = -107 Jan 19 09:54:36 kernel: [5414618.046170] (20385,2):dlm_send_remote_unlock_request:359 ERROR: status = -107 Jan 19 09:54:36 kernel: [5414618.046180] (20385,2):dlm_send_remote_unlock_request:359 ERROR: status = -107 Jan 19 09:54:36 kernel: [5414618.046184] (20385,2):dlm_send_remote_unlock_request:359 ERROR: status = -107 Jan 19 09:54:36 kernel: [5414618.046208] (20385,2):dlm_send_remote_unlock_request:359 ERROR: status = -107 Jan 19 09:54:36 kernel: [5414618.046213] (20385,2):dlm_send_remote_unlock_request:359 ERROR: status = -107 Jan 19 09:54:36 kernel: [5414618.046255] (20385,2):dlm_send_remote_unlock_request:359 ERROR: status = -107 Jan 19 09:54:36 kernel: [5414618.046259] (20385,2):dlm_send_remote_unlock_request:359 ERROR: status = -107 Jan 19 09:54:36 kernel: [5414618.046277] (20385,2):dlm_send_remote_unlock_request:359 ERROR: status = -107 Jan 19 09:54:36 kernel: [5414618.046281] (20385,2):dlm_send_remote_unlock_request:359 ERROR: status = -107 Jan 19 09:54:36 kernel: [5414618.046317] (20385,2):dlm_send_remote_unlock_request:359 ERROR: status = -107 Jan 19 09:54:36 kernel: [5414618.046322] (20385,2):dlm_send_remote_unlock_request:359 ERROR: status = -107 Jan 19 09:54:36 kernel: [5414618.046363] (20385,2):dlm_send_remote_unlock_request:359 ERROR: status = -107 Jan 19 09:54:36 kernel: [5414618.046367] (20385,2):dlm_send_remote_unlock_request:359 ERROR: status = -107 Jan 19 09:54:36 kernel: [5414618.046374] (20385,2):dlm_send_remote_unlock_request:359 ERROR: status = -107 Jan 19 09:54:36 kernel: [5414618.046379] (20385,2):dlm_send_remote_unlock_request:359 ERROR: status = -107 Jan 19 09:54:36 kernel: [5414618.046394] (20385,2):dlm_send_remote_unlock_request:359 ERROR: status = -107 Jan 19 09:54:36 kernel: [5414618.046399] (20385,2):dlm_send_remote_unlock_request:359 ERROR: status = -107 Jan 19 09:54:36 kernel: [5414618.046405] (20385,2):dlm_send_remote_unlock_request:359 ERROR: status = -107 Jan 19 09:54:36 kernel: [5414618.046410] (20385,2):dlm_send_remote_unlock_request:359 ERROR: status = -107 Jan 19 09:54:36 kernel: [5414618.147644] o2net: accepted connection from node XXX (num 1) at x.x.x.x:xxxx
Jan 19 09:54:36 kernel: [5414620.867740] (31712,1):dlm_do_master_request:1342 ERROR: link to 1 went down!
Jan 19 09:54:36 kernel: [5414620.867740] (31712,1):dlm_get_lock_resource:919 ERROR: status = -112 Jan 19 09:54:36 kernel: [5414620.871982] (31124,9):dlm_do_master_request:1342 ERROR: link to 1 went down! Jan 19 09:54:36 kernel: [5414620.871982] (31124,9):dlm_get_lock_resource:919 ERROR: status = -112 --

Technical investigations resulted that it was not caused by network problem.


-- System Information:
Debian Release: 5.0.7
 APT prefers stable
Architecture: amd64 (x86_64)

Kernel: Linux 2.6.26-26lenny1
CPU: Intel(R) Xeon(R) CPU E5620

Versions of packages ocfs2-tools depends on:
ii libc6 2.7-18lenny6 ii libcomerr2 1.41.3-1 ii libglib2.0-0 2.16.6-3 ii libncurses5 5.7+20081213-1 ii libreadline5 5.2-3.1 ii libuuid1 1.41.3-1

Versions of packages ocfs2-tools suggests:
ii ocfs2console 1.4.1-1
/etc/default/o2cb values:
O2CB_HEARTBEAT_THRESHOLD=31
O2CB_IDLE_TIMEOUT_MS=30000
O2CB_KEEPALIVE_DELAY_MS=2000
O2CB_RECONNECT_DELAY_MS=2000


Regards,
Szabolcs JANOSI






--- End Message ---
--- Begin Message ---
On Thu, Feb 09, 2012 at 04:52:34PM -0600, Jonathan Nieder wrote:
> reassign 610530 linux-2.6 linux-2.6/2.6.26-26lenny1
> quit
> 
> Hi Szabolcs,
> 
> Szabolcs JANOSI wrote:
> 
> > Justification: causes high load, reboot on both nodes (2-node cluster)
> >
> > Log details on node1:
> >
> > (6894,14):dlm_drop_lockres_ref:2224 ERROR: while dropping ref on A35DE40B6A044A4A873B96E2F2DE42B2:M000000000000000112401200000000 (master=0) got -22.
> > lockres: M00000000000000011240120000000, owner=0, state=64
> >   last used: 5332038594, refcnt: 3, on purge list: yes
> >   on dirty list: no, on reco list: no, migrating pending: no
> >   inflight locks: 0, asts reserved: 0
> >   refmap nodes: [ ], inflight=0
> >   granted queue:
> >   converting queue:
> >   blocked queue:
> > ------------[ cut here ]------------
> > kernel BUG at fs/ocfs2/dlm/dlmmaster.c:2226!
> [...]
> > Code: 8b 14 25 24 00 00 00 48 c7 c1 e0 89 39 a0 89 d2 4c 89 74 24 08 89 44 24 10 31 c0 89 2c 24 e8 2c 90 ea df 4c 89 e7 e8 32 43 ff ff <0f> 0b eb fe 48 83 c4 70 89 d8 5b 5d 41 5c 41 5d 41 5e c3 41 54
> > RIP  [<ffffffffa038c381>] :ocfs2_dlm:dlm_drop_lockres_ref+0x1dd/0x1f0
> [...]
> > Technical investigations resulted that it was not caused by network problem.
> 
> I guess this was reproducible.  Was it a regression?  (I.e., do you
> know of any previous kernel that worked ok?)
> 
> | $ git show debian/lenny:fs/ocfs2/dlm/dlmmaster.c | sed -n 2220,2226' 'p
> |	else if (r < 0) {
> |		/* BAD.  other node says I did not have a ref. */
> |		mlog(ML_ERROR,"while dropping ref on %s:%.*s "
> |		    "(master=%u) got %d.\n", dlm->name, namelen,
> |		    lockname, res->owner, r);
> |		dlm_print_one_lock_resource(res);
> |		BUG();
> 
> What kernel do you use these days?  Can you still reproduce this?
> 
> If you can reproduce this with a current squeeze or sid kernel, the next
> step will be to get in touch from upstream.  Sorry we missed this before.

No further feedback, closing the bug.

If the bug can be reproduced with a current kernel (e.g. Wheezy), please
reopen.

Cheers,
        Moritz

--- End Message ---

Reply to: