[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Systemd and Init was Problems communicating with and between servers after upgrade - correction



On 08/09/16 23:50, Clive Menzies wrote:
On 08/09/16 23:07, Clive Menzies wrote:
Hi

We've suffered a series of seemingly disconnect problems on 4 machines since upgrading jessie on Monday:

apt/log:

Start-Date: 2016-09-05  12:17:10
Commandline: apt-get upgrade
Upgrade: libgcrypt20:i386 (1.6.3-2+deb8u1, 1.6.3-2+deb8u2), gnupg:i386 (1.4.18-7+deb8u1, 1.4.18-7+deb8u2), linux-libc-dev:i386 (3.16.7-ckt25-2+deb8u3, 3.16.36-1+deb8u1), linux-image-3.16.0-4-686-pae:i386 (3.16.7-ckt25-2+deb8u3, 3.16.36-1+deb8u1), gpgv:i386 (1.4.18-7+deb8u1, 1.4.18-7+deb8u2), libidn11:i386 (1.29-1+deb8u1, 1.29-1+deb8u2)
End-Date: 2016-09-05  12:19:22

First there was a DMA error on bootup on file and mail server_U, we were alerted to by no email being delivered from dovecot on the server. We fsck'd the disk offline and no errors were reported. Although the system would boot, dovecot wouldn't work. We've been through so many permutations and combinations since that I can't remember each step we took after that, some of which we repeated. Eventually, in spite of the fsck result, we replaced the disk and reinstalled. The samba installation worked out of the box but dovecot and rsync (for automated remote backups) didn't. It turned out to be a certification problem which required creating the certs while making sure dovecot knows where to look (not straightforward). I can't remember the specifics of the rsync issue; it may have been self-inflicted. Eventually, all was well and everything was working, including the remote backup.

Tuesday: we'd lost ssh connection to the two remote backup servers via the VPN; I've no idea of their state other than getting someone onsite to reboot them using the power button - they appear to be working.

We then found that laptop_T can access smb shares (using both windows and debian systems) on file and mail server_M but two other linux machines couldn't, nor could a remote windows laptop via VPN, (but that may be because the user's machine is "broken" but the timing is suspicious). Nor can he get to dovecot with his Thunderbird email client which may be related to the same upgrade.

We've compared /etc/fstab on laptop_T which mounts the shares with no problem to that on laptop_D which doesn't. Same user, same share and they are identical in respect of mounting the shares but one works, one doesn't.

As server_U is working after reinstallation, following much exploration, we reinstalled jessie 8.3 on server_M and committed to systemd to avoid potential progressive sysv-init problems we'd learned of during our investigation. After a reinstall of the system and subsequently samba (twice), we resorted to the maintainer's version of /etc/smb.conf and customised it for our setup. We tried to keep the configuration as vanilla as possible but there was no improvement in terms of access from the two debian machines.

In comparing the /etc/smb.conf with that on server_U, we noticed that the winserver IP address on U was uncommented and gave its LAN IP (it is acting as the winserver for the workgroup). We edited server_M /etc/fstab to include the winserver IP and debian workstation_E saw the shares in file manager but the share didn't show up in df -h. The shares appeared to be unmounted but were accessible through Thunar(FM). We commented it out and access broke, uncommented it and it worked again. On laptop_D the IP "fix" didn't have any effect - won't mount and can't be seen.
Sorry, brain disengaged. We edited server_M /etc/smb.conf to include the winserver IP NOT /etc/fstab

At this point we're stuck which gives pause for reflection. These 4 servers have been running stable debian for over 10 years and apart from the odd hardware issue have been rock solid. Two of the machines have been replaced more than once over the years but the other two are the original boxes. Most upgrades were pretty seamless and if there were problems, a short burst of intensive exploration, trial and error, quickly resolved them.

This nightmare of expanding problems has been going on for three days, since Monday afternoon. Never before have I questioned the decision to base our business (and our lives) on Debian and I remain a firm advocate. I also recognise that over successive releases, accommodating a plethora of configurations becomes harder and that at some point a step changes in the foundations of the system are required. I'm presuming that the transition to systemd from sysv-init was an essential step and understand that backwards compatibility becomes more challenging as time goes on.

Whether this systemd transition is related to the remote connectivity with the servers and the samba issue, I don't know but this number of seemingly random but mission critical series of problems has shaken our confidence.

Apologies if this sounds like a complaint, it's not. It is a concern, which someone may be able to allay, that Debian is not as rock solid as it was.

You guys have done brilliant work and I'm aware that my contribution to the project has been very small and pretty non-existent for the last few years - other priorities. So thank you.


We've managed to sort out the urgent mission critical problems, only some of which seem related to the upgrade. Coincidental problems arose which really obscured what was going on. So far we've our two main servers up and one of the backup servers at the remote location. The other one we've recovered because we'll reinstall it - it is the same hardware configuration as server_M.

For reasons I can't specify, IDE seems to be an issue and so we put a SATA disk in server_M, reinstalled and it seems to be working OK.

However, studying documentation on systemd v init, I'm a bit confused. I assumed the reinstall would implement systemd for all services and init wouldn't be visible although symlinks will use init where necessary. systemd-sysv is installed as per the Debian wiki.

Looking at other setups on the net, a ps aux should assign the first process (1) to systemd but on server_M it is /sbin/init

Is this right?

Thanks

Clive

--
Clive Menzies
http://freecriticalthinking.org


Reply to: