[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Problems communicating with and between servers after upgrade



Hi

We've suffered a series of seemingly disconnect problems on 4 machines since upgrading jessie on Monday:

apt/log:

Start-Date: 2016-09-05  12:17:10
Commandline: apt-get upgrade
Upgrade: libgcrypt20:i386 (1.6.3-2+deb8u1, 1.6.3-2+deb8u2), gnupg:i386 (1.4.18-7+deb8u1, 1.4.18-7+deb8u2), linux-libc-dev:i386 (3.16.7-ckt25-2+deb8u3, 3.16.36-1+deb8u1), linux-image-3.16.0-4-686-pae:i386 (3.16.7-ckt25-2+deb8u3, 3.16.36-1+deb8u1), gpgv:i386 (1.4.18-7+deb8u1, 1.4.18-7+deb8u2), libidn11:i386 (1.29-1+deb8u1, 1.29-1+deb8u2)
End-Date: 2016-09-05  12:19:22

First there was a DMA error on bootup on file and mail server_U, we were alerted to by no email being delivered from dovecot on the server. We fsck'd the disk offline and no errors were reported. Although the system would boot, dovecot wouldn't work. We've been through so many permutations and combinations since that I can't remember each step we took after that, some of which we repeated. Eventually, in spite of the fsck result, we replaced the disk and reinstalled. The samba installation worked out of the box but dovecot and rsync (for automated remote backups) didn't. It turned out to be a certification problem which required creating the certs while making sure dovecot knows where to look (not straightforward). I can't remember the specifics of the rsync issue; it may have been self-inflicted. Eventually, all was well and everything was working, including the remote backup.

Tuesday: we'd lost ssh connection to the two remote backup servers via the VPN; I've no idea of their state other than getting someone onsite to reboot them using the power button - they appear to be working.

We then found that laptop_T can access smb shares (using both windows and debian systems) on file and mail server_M but two other linux machines couldn't, nor could a remote windows laptop via VPN, (but that may be because the user's machine is "broken" but the timing is suspicious). Nor can he get to dovecot with his Thunderbird email client which may be related to the same upgrade.

We've compared /etc/fstab on laptop_T which mounts the shares with no problem to that on laptop_D which doesn't. Same user, same share and they are identical in respect of mounting the shares but one works, one doesn't.

As server_U is working after reinstallation, following much exploration, we reinstalled jessie 8.3 on server_M and committed to systemd to avoid potential progressive sysv-init problems we'd learned of during our investigation. After a reinstall of the system and subsequently samba (twice), we resorted to the maintainer's version of /etc/smb.conf and customised it for our setup. We tried to keep the configuration as vanilla as possible but there was no improvement in terms of access from the two debian machines.

In comparing the /etc/smb.conf with that on server_U, we noticed that the winserver IP address on U was uncommented and gave its LAN IP (it is acting as the winserver for the workgroup). We edited server_M /etc/fstab to include the winserver IP and debian workstation_E saw the shares in file manager but the share didn't show up in df -h. The shares appeared to be unmounted but were accessible through Thunar(FM). We commented it out and access broke, uncommented it and it worked again. On laptop_D the IP "fix" didn't have any effect - won't mount and can't be seen.

At this point we're stuck which gives pause for reflection. These 4 servers have been running stable debian for over 10 years and apart from the odd hardware issue have been rock solid. Two of the machines have been replaced more than once over the years but the other two are the original boxes. Most upgrades were pretty seamless and if there were problems, a short burst of intensive exploration, trial and error, quickly resolved them.

This nightmare of expanding problems has been going on for three days, since Monday afternoon. Never before have I questioned the decision to base our business (and our lives) on Debian and I remain a firm advocate. I also recognise that over successive releases, accommodating a plethora of configurations becomes harder and that at some point a step changes in the foundations of the system are required. I'm presuming that the transition to systemd from sysv-init was an essential step and understand that backwards compatibility becomes more challenging as time goes on.

Whether this systemd transition is related to the remote connectivity with the servers and the samba issue, I don't know but this number of seemingly random but mission critical series of problems has shaken our confidence.

Apologies if this sounds like a complaint, it's not. It is a concern, which someone may be able to allay, that Debian is not as rock solid as it was.

You guys have done brilliant work and I'm aware that my contribution to the project has been very small and pretty non-existent for the last few years - other priorities. So thank you.

Regards

Clive

PS If anyone has any ideas to help, they'd be much appreciated.

--
Clive Menzies
http://freecriticalthinking.org


Reply to: