[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#788295: open-iscsi+multipath+lvm breakage actually caused by several other bugs; incorrect size reporting merely a strange side effect



The severe breakage I've been experiencing turns out to be a combination
of Debian bugs #782487/782488 and Ubuntu bug #1431650. I'm not sure why,
but the incorrect size reporting goes away if these other problems are
fixed.

Ubuntu bug #1431650 is deadlock between udev and multipath-tools until
systemd steps in and kills them. The critical clue was this set of
timeouts when iSCSI starts:

Jul 13 15:08:31 xxxxxxx kernel: sd 13:0:0:3: [sdl] Attached SCSI disk
Jul 13 15:08:31 xxxxxxx kernel: sd 12:0:0:3: [sdk] Attached SCSI disk
Jul 13 15:08:31 xxxxxxx kernel: sd 15:0:0:3: [sdn] Attached SCSI disk
Jul 13 15:08:31 xxxxxxx kernel: sd 14:0:0:3: [sdm] Attached SCSI disk
Jul 13 15:09:01 xxxxxxx systemd-udevd[217]: worker [13761] /devices/platform/host12/session1/target12:0:0/12:0:0:3/block/sdk timeout; kill it
Jul 13 15:09:01 xxxxxxx systemd-udevd[217]: seq 3784 '/devices/platform/host12/session1/target12:0:0/12:0:0:3/block/sdk' killed
Jul 13 15:09:01 xxxxxxx systemd-udevd[217]: worker [13763] /devices/platform/host13/session2/target13:0:0/13:0:0:3/block/sdl timeout; kill it
Jul 13 15:09:01 xxxxxxx systemd-udevd[217]: seq 3783 '/devices/platform/host13/session2/target13:0:0/13:0:0:3/block/sdl' killed
Jul 13 15:09:01 xxxxxxx systemd-udevd[217]: worker [13764] /devices/platform/host15/session4/target15:0:0/15:0:0:3/block/sdn timeout; kill it
Jul 13 15:09:01 xxxxxxx systemd-udevd[217]: seq 3785 '/devices/platform/host15/session4/target15:0:0/15:0:0:3/block/sdn' killed
Jul 13 15:09:01 xxxxxxx systemd-udevd[217]: worker [13765] /devices/platform/host14/session3/target14:0:0/14:0:0:3/block/sdm timeout; kill it
Jul 13 15:09:01 xxxxxxx systemd-udevd[217]: seq 3786 '/devices/platform/host14/session3/target14:0:0/14:0:0:3/block/sdm' killed
Jul 13 15:09:01 xxxxxxx systemd-udevd[217]: worker [13761] terminated by signal 9 (Killed)
Jul 13 15:09:01 xxxxxxx systemd-udevd[217]: worker [13763] terminated by signal 9 (Killed)
Jul 13 15:09:01 xxxxxxx systemd-udevd[217]: worker [13764] terminated by signal 9 (Killed)
Jul 13 15:09:01 xxxxxxx systemd-udevd[217]: worker [13765] terminated by signal 9 (Killed)

I'll file a bug report against multipath-tools asking for the patch to
be merged. With this problem solved, multipath devices could be manually
activated, often without size problems. They started up on boot, but the
size problem was present and they were shut down again within seconds,
even for test LUNs with no partition table.

Jul 13 17:21:24 xxxxxxx kernel: [   16.079329] device-mapper: table: 254:3: dm-1 too small for target: start=2048, len=12638472159, dev_size=12638472159
Jul 13 17:21:24 xxxxxxx kernel: [   16.192112] device-mapper: table: 254:3: dm-1 too small for target: start=2048, len=12638472159, dev_size=12638472159
Jul 13 17:21:24 xxxxxxx kernel: [   16.284550] device-mapper: table: 254:3: dm-1 too small for target: start=2048, len=12638472159, dev_size=12638472159
Jul 13 17:21:24 xxxxxxx kernel: [   16.374621] device-mapper: table: 254:3: dm-1 too small for target: start=2048, len=12638472159, dev_size=12638472159
Jul 13 17:21:24 xxxxxxx kernel: [   16.469226] device-mapper: table: 254:3: dm-1 too small for target: start=2048, len=12638472159, dev_size=12638472159
[...]
Jul 13 17:21:25 xxxxxxx multipath-tools[1001]: Starting multipath daemon: multipathd.
Jul 13 17:21:25 xxxxxxx multipathd: 222e80001557aa18f devmap removed
Jul 13 17:21:25 xxxxxxx multipathd: 2221e000155449c2e devmap removed
Jul 13 17:21:25 xxxxxxx multipathd: 22268000155ea65e2 devmap removed

Multipath now gave useful debug output, and it soon became clear that
bug #782487/#782488 was to blame. Applying the suggested workaround by
editing blacklist_exceptions solve the problem. If it's not possible to
get a fix into a point release, it should probably be mentioned in
NEWS.Debian or the release notes as it causes severe breakage on
upgrades.

With this set of fixes, setting LVMGROUPS /etc/default/open-iscsi works
correctly (in the way it didn't on wheezy if multipath was in use).
Volume groups on multipath iSCSI come up correctly and filesystems on
them are mounted.


Reply to: