[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#788295: marked as done (wheezy->jessie: open-iscsi+multipath+LVM broken worse than before, plus problematic interactions with official VMware Tools)



Your message dated Tue, 9 Mar 2021 21:55:37 +0100
with message-id <43b04122-a47b-31d6-d220-2540e920dc6b@debian.org>
and subject line upgrade report for EOL Debian release
has caused the Debian Bug report #788295,
regarding wheezy->jessie: open-iscsi+multipath+LVM broken worse than before, plus problematic interactions with official VMware Tools
to be marked as done.

This means that you claim that the problem has been dealt with.
If this is not the case it is now your responsibility to reopen the
Bug report if necessary, and/or fix the problem forthwith.

(NB: If you are a system administrator and have no idea what this
message is talking about, this may indicate a serious mail system
misconfiguration somewhere. Please contact owner@bugs.debian.org
immediately.)


-- 
788295: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=788295
Debian Bug Tracking System
Contact owner@bugs.debian.org with problems
--- Begin Message ---
Package: upgrade-reports
Severity: normal

I recently upgraded a VM from wheezy + a kernel from wheezy-backports to
jessie. This VM mounts data volumes using LVM and multipath iSCSI, and this was
severely broken after the upgrade.

To start with, I want to be clear that broken lvm2 + open-iscsi +
multipath-tools is not a new thing (see e.g. bugs #547187, #455979 and
#605470), but it's broken differently in jessie and my old workarounds don't
work, so I thought it worth sharing what I've found.

PRE-UPGRADE CONFIGURATION:

The system uses open-iscsi to connect to four portals on an iSCSI target. The
storage appliance supports active/active operation. If I set LVMGROUPS in
/etc/default/open-iscsi, LVM activates on one of the individual block devices
instead of using the multipath device, so I left LVMGROUPS blank and instead
ran:

  vgchange -a y
  mount -a

after each boot.

I have this:

      # from multipath-tools FAQ
      types = [ "device-mapper", 16 ]

in /etc/lvm/lvm.conf so that LVM will look at the multipath devices.

Filesystems on LVM on iSCSI have the _netdev option in /etc/fstab.

NARRATIVE:

When I rebooted after the upgrade, I found that open-iscsi was not starting. On
investigation, I found that systemd was breaking a dependency loop (the details
of which I don't have, as apparently it wasn't logged anywhere?) invovling
open-iscsi, several other packages and VMware Tools by not starting open-iscsi.
Once I figured out what was going on, this was easy to solve by removing VMware
Tools and replacing it with open-vm-tools.

Before I figured out why the iSCSI initiator wasn't starting, I tried starting
it myself and activating LVM. The system hung on shutdown and printed this on
the console:

[  457.367028]  connection4:0 ping timeout of 5 secs expired, recv timeout 5, last rx 4295004020, last ping 4295005272, now 429006524
[  457.383035]  connection1:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4295004024, last ping 4295005276, now 4295006528
[  547.383178]  connection2:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4295004024, last ping 4295005276, now 4295006528
[  460.217125]  connection3:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4295004730, last ping 4295005982, now 4295007236
[  472.502769] systemd-shutdown[1] Failed to finalize  DM devices, ignoring
[  600.692475] INFO: task lvm:2445 blocked for more than 120 seconds
[  600.692591]      Not tainted 3.16.0-4-amd64 #1
[  600.692670] "echo > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  720.780789] INFO: task lvm:2445 blocked for more than 120 seconds
[  720.780876]      Not tainted 3.16.0-4-amd64 #1
[  720.780930] "echo > /proc/sys/kernel/hung_task_timeout_secs" disables this message.

(Pardon any typoes: this was hand-transcribed from a screenshot.)

After solving this problem, I found that without it being listed in LVMGROUPS,
my volume group was not activated, as before. With LVMGROUPS set correctly, it
was activated and filesystems were mounted (albiet very slowly, possibly
connected with bug #775778), but (as before) LVM was not using the multipath
device.

I found that unlike the situation with wheezy, I could not get LVM to use
multipath at all:

- There's no difference in behavior regardless of whether I have 'types' set in
  /etc/lvm/lvm.conf per the multipath FAQ, possibly because:

- No matter what I have in LVMGROUPS, 'multipath -l' does not show my devices
  after reboot, possibly because:

- On some trials, multipath-tools starts after open-iscsi.

- If I list my VG in LVMGROUPS, it's mounted and activated, but LVM is not
  using multipath.

- I can get multipath to see the devices if I stop & restart all the services,
  but LVM insists on using /dev/sd?. If I filter "r/sd.*/", LVM doesn't see any
  PVs.

- NEWS.Debian mentions that multipath-tools does not provide a systemd unit,
  but does not mention any need to change configuration.

At present the system is not able to use multipath, but I'm keeping it on
jessie instead of reverting because I need certain other bugfixes & updates
that jessie delivers.


MINUTIAE:

It's worth noting that even with VMware Tools removed, systemd is breaking a
dependency loop by not starting clvm. I didn't look into this further because
I'm not (knowingly) using clvm.

May 07 11:51:44 backup1 systemd[1]: Found ordering cycle on basic.target/start
May 07 11:51:44 backup1 systemd[1]: Found dependency on sysinit.target/start
May 07 11:51:44 backup1 systemd[1]: Found dependency on clvm.service/start
May 07 11:51:44 backup1 systemd[1]: Found dependency on corosync.service/start
May 07 11:51:44 backup1 systemd[1]: Found dependency on basic.target/start
May 07 11:51:44 backup1 systemd[1]: Breaking ordering cycle by deleting job clvm.service/start
May 07 11:51:44 backup1 systemd[1]: Job clvm.service/start deleted to break ordering cycle starting with basic.target/start




-- System Information:
Debian Release: 8.0
  APT prefers stable
  APT policy: (500, 'stable')
Architecture: amd64 (x86_64)

Kernel: Linux 3.16.0-4-amd64 (SMP w/4 CPU cores)
Locale: LANG=en_US, LC_CTYPE=en_US (charmap=ISO-8859-1)
Shell: /bin/sh linked to /bin/bash
Init: systemd (via /run/systemd/system)

--- End Message ---
--- Begin Message ---
Dear reporter,

Thanks for taking the time long ago to submit your upgrade report. I'm
closing these reports now because the Debian releases they were reported
against have reached their end-of-life (some long ago).

Unfortunately it's possible that the report I'm now closing may still
have relevant information for the current release (bullseye). If you
believe that's the case, don't hesitate to reopen the bug, retitle it
and provide further information and it will be seen during the current
freeze period of Debian.

Paul



Attachment: OpenPGP_signature
Description: OpenPGP digital signature


--- End Message ---

Reply to: