[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Squeeze DRBD Xen: Kernel Trap/Crash in DRBD



I am using a 2 node Xen+DRBD configuration
and always get Dom0 kernel traps in the DRBD module
some time after starting PV DomU using some DRBD resources.

System type: HP DL360 G6
18 GB of memory installed, 3 GB reserved for Dom0 (dom0_mem=3072M)

ii drbd8-utils 2:8.3.7-2.1
ii linux-image-2.6.32-5-xen-amd64 2.6.32-30
ii xen-hypervisor-4.0-amd64 4.0.1-2
ii xen-linux-system-2.6.32-5-xen-amd64 2.6.32-30
ii xen-qemu-dm-4.0 4.0.1-2
ii xen-tools 4.2-1
ii xen-utils-4.0 4.0.1-2

I already disabled
+ sendpage
(in /etc/rc.local: echo 1 > /sys/module/drbd/parameters/disable_sendpage)

+ diverse eth offload options
ethtool -K eth0 rx off tx off gso off sg off
ethtool -K eth1 rx off tx off gso off sg off
ethtool -K eth2 rx off tx off gso off sg off
ethtool -K eth3 rx off tx off gso off sg off

Does anyone else have experienced similar problems / has found a
solution / workaround ?

An older configuration based on linux 2.6.26-1-xen-amd64, xen 3.2-1 and
self build DRBD 8.3.1 module
works fine on the same hardware.

Log excerpt:

ar 9 13:59:07 xen20a kernel: [ 2344.417046] block drbd0: conn(
SyncSource -> Connected ) pdsk( Inconsistent -> UpToDate ) │
Ma│Mar 9 14:02:03 xen20a kernel: [ 2520.266765] drbd2_receive D
0000000000000000 0 3336 2 0x00000000 │
Ma│Mar 9 14:02:03 xen20a kernel: [ 2520.266775] ffffffff814791f0
0000000000000246 0000000000000000 ffff8800b6e80e20 │
Ma│Mar 9 14:02:03 xen20a kernel: [ 2520.266785] 0000000000000000
ffff8800b6e80e20 000000000000f9e0 ffff8800b646ffd8 │
Ma│Mar 9 14:02:03 xen20a kernel: [ 2520.266795] 0000000000015780
0000000000015780 ffff8800b6e80e20 ffff8800b6e81118 │
Ma│Mar 9 14:02:03 xen20a kernel: [ 2520.266804] Call Trace: │
Ma│Mar 9 14:02:03 xen20a kernel: [ 2520.266822] [<ffffffffa03ce7ec>] ?
drbd_suspend_io+0x76/0x8c [drbd] │
Ma│Mar 9 14:02:03 xen20a kernel: [ 2520.266832] [<ffffffff81065d4a>] ?
autoremove_wake_function+0x0/0x2e │
Ma│Mar 9 14:02:03 xen20a kernel: [ 2520.266842] [<ffffffffa03d09a7>] ?
drbd_determin_dev_size+0x29/0x35d [drbd] │
Ma│Mar 9 14:02:03 xen20a kernel: [ 2520.266852] [<ffffffffa03b896f>] ?
drbd_recv+0x74/0x147 [drbd] │
Ma│Mar 9 14:02:03 xen20a kernel: [ 2520.266861] [<ffffffffa03b896f>] ?
drbd_recv+0x74/0x147 [drbd] │
Ma│Mar 9 14:02:03 xen20a kernel: [ 2520.266869] [<ffffffff8100e63d>] ?
xen_force_evtchn_callback+0x9/0xa │
Ma│Mar 9 14:02:03 xen20a kernel: [ 2520.266875] [<ffffffff8100ecf2>] ?
check_events+0x12/0x20 │
Ma│Mar 9 14:02:03 xen20a kernel: [ 2520.266885] [<ffffffffa03bc726>] ?
receive_sizes+0x308/0x535 [drbd] │
Ma│Mar 9 14:02:03 xen20a kernel: [ 2520.266895] [<ffffffffa03bffec>] ?
drbdd_init+0x19c/0x292 [drbd] │
Ma│Mar 9 14:02:03 xen20a kernel: [ 2520.266904] [<ffffffffa03cc701>] ?
drbd_thread_setup+0x2b/0xf8 [drbd] │
Ma│Mar 9 14:02:03 xen20a kernel: [ 2520.266913] [<ffffffffa03cc6d6>] ?
drbd_thread_setup+0x0/0xf8 [drbd] │
Ma│Mar 9 14:02:03 xen20a kernel: [ 2520.266919] [<ffffffff81065a7d>] ?
kthread+0x79/0x81 │
Ma│Mar 9 14:02:03 xen20a kernel: [ 2520.266926] [<ffffffff81012baa>] ?
child_rip+0xa/0x20 │
Ma│Mar 9 14:02:03 xen20a kernel: [ 2520.266932] [<ffffffff81011d61>] ?
int_ret_from_sys_call+0x7/0x1b │
Ma│Mar 9 14:02:03 xen20a kernel: [ 2520.266938] [<ffffffff8101251d>] ?
retint_restore_args+0x5/0x6 │
Ma│Mar 9 14:02:03 xen20a kernel: [ 2520.266945] [<ffffffff8102ddc0>] ?
pvclock_clocksource_read+0x3a/0x8b │
Ma│Mar 9 14:02:03 xen20a kernel: [ 2520.266951] [<ffffffff81012ba0>] ?
child_rip+0x0/0x20 │

Thanks in advance for any hints,
Bruno



Reply to: