[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#698225: linux-image-2.6.32-5-686-bigmem: split-brain when running "drbdadm primary $DEV" with dual primary setup in Sec/Sec state



> > [ 5067.155466] block drbd0: Starting worker thread (from cqueue [2337])
> > [ 5067.155541] block drbd0: disk( Diskless -> Attaching ) 
> > [ 5067.207081] block drbd0: conn( Unconnected -> WFConnection ) 
> 
> Device enabled again and trying to connect.
> 
> > [ 5067.208501] block drbd0: role( Secondary -> Primary ) 
> > [ 5067.212759] block drbd0: Creating new current UUID
> 
> Set to primary.
> 
> > [ 5067.503518] block drbd0: Handshake successful: Agreed network protocol version 91
> > [ 5067.503525] block drbd0: conn( WFConnection -> WFReportParams ) 
> 
> Connection established _after_ it was promoted to primary.

$CURSES, I have been looking at this problem for hours but this splipped my
attention.

> > [ 5067.503888] block drbd0: drbd_sync_handshake:
> > [ 5067.503894] block drbd0: self D88E7AD12FFEA493:49D971C9C18FC2FE:167E069D45704F1A:F1C0D4200B9792F4 bits:0 flags:0
> > [ 5067.503899] block drbd0: peer DD932456670DF62F:49D971C9C18FC2FE:167E069D45704F1A:F1C0D4200B9792F4 bits:0 flags:0
> 
> The remote device was also promoted to primary before the connection was
> established.
> 
> You have to wait until both machines are connected before promoting them
> to primary. The init script does this.

TLDR: Please close this bug, the real problem lies in the drbd resource
agent. Sorry for the noise.

The original problem I was hunting was pacemaker always creating a drbd split
brain when stop/starting the dual primary resource. Turns out the resource agent
does not wait for the connection to be established before promoting both nodes
to primary, the same this test setup did.

I'll open a new bug against drbd8-utils

Regards,

	Stefan
-- 
The surest protection against temptation is cowardice.
		-- Mark Twain


Reply to: