On 08/09/16 23:07, Clive Menzies wrote:
Hi
We've suffered a series of seemingly disconnect problems on 4
machines since upgrading jessie on Monday:
apt/log:
Start-Date: 2016-09-05 12:17:10
Commandline: apt-get upgrade
Upgrade: libgcrypt20:i386 (1.6.3-2+deb8u1, 1.6.3-2+deb8u2),
gnupg:i386 (1.4.18-7+deb8u1, 1.4.18-7+deb8u2), linux-libc-dev:i386
(3.16.7-ckt25-2+deb8u3, 3.16.36-1+deb8u1),
linux-image-3.16.0-4-686-pae:i386 (3.16.7-ckt25-2+deb8u3,
3.16.36-1+deb8u1), gpgv:i386 (1.4.18-7+deb8u1, 1.4.18-7+deb8u2),
libidn11:i386 (1.29-1+deb8u1, 1.29-1+deb8u2)
End-Date: 2016-09-05 12:19:22
First there was a DMA error on bootup on file and mail server_U, we
were alerted to by no email being delivered from dovecot on the
server. We fsck'd the disk offline and no errors were reported.
Although the system would boot, dovecot wouldn't work. We've been
through so many permutations and combinations since that I can't
remember each step we took after that, some of which we repeated.
Eventually, in spite of the fsck result, we replaced the disk and
reinstalled. The samba installation worked out of the box but dovecot
and rsync (for automated remote backups) didn't. It turned out to be
a certification problem which required creating the certs while
making sure dovecot knows where to look (not straightforward). I
can't remember the specifics of the rsync issue; it may have been
self-inflicted. Eventually, all was well and everything was working,
including the remote backup.
Tuesday: we'd lost ssh connection to the two remote backup servers
via the VPN; I've no idea of their state other than getting someone
onsite to reboot them using the power button - they appear to be
working.
We then found that laptop_T can access smb shares (using both windows
and debian systems) on file and mail server_M but two other linux
machines couldn't, nor could a remote windows laptop via VPN, (but
that may be because the user's machine is "broken" but the timing is
suspicious). Nor can he get to dovecot with his Thunderbird email
client which may be related to the same upgrade.
We've compared /etc/fstab on laptop_T which mounts the shares with no
problem to that on laptop_D which doesn't. Same user, same share and
they are identical in respect of mounting the shares but one works,
one doesn't.
As server_U is working after reinstallation, following much
exploration, we reinstalled jessie 8.3 on server_M and committed to
systemd to avoid potential progressive sysv-init problems we'd
learned of during our investigation. After a reinstall of the system
and subsequently samba (twice), we resorted to the maintainer's
version of /etc/smb.conf and customised it for our setup. We tried to
keep the configuration as vanilla as possible but there was no
improvement in terms of access from the two debian machines.
In comparing the /etc/smb.conf with that on server_U, we noticed that
the winserver IP address on U was uncommented and gave its LAN IP (it
is acting as the winserver for the workgroup). We edited server_M
/etc/fstab to include the winserver IP and debian workstation_E saw
the shares in file manager but the share didn't show up in df -h. The
shares appeared to be unmounted but were accessible through
Thunar(FM). We commented it out and access broke, uncommented it and
it worked again. On laptop_D the IP "fix" didn't have any effect -
won't mount and can't be seen.
Sorry, brain disengaged. We edited server_M /etc/smb.conf to include
the winserver IP NOT /etc/fstab
At this point we're stuck which gives pause for reflection. These 4
servers have been running stable debian for over 10 years and apart
from the odd hardware issue have been rock solid. Two of the machines
have been replaced more than once over the years but the other two
are the original boxes. Most upgrades were pretty seamless and if
there were problems, a short burst of intensive exploration, trial
and error, quickly resolved them.
This nightmare of expanding problems has been going on for three
days, since Monday afternoon. Never before have I questioned the
decision to base our business (and our lives) on Debian and I remain
a firm advocate. I also recognise that over successive releases,
accommodating a plethora of configurations becomes harder and that at
some point a step changes in the foundations of the system are
required. I'm presuming that the transition to systemd from sysv-init
was an essential step and understand that backwards compatibility
becomes more challenging as time goes on.
Whether this systemd transition is related to the remote connectivity
with the servers and the samba issue, I don't know but this number of
seemingly random but mission critical series of problems has shaken
our confidence.
Apologies if this sounds like a complaint, it's not. It is a concern,
which someone may be able to allay, that Debian is not as rock solid
as it was.
You guys have done brilliant work and I'm aware that my contribution
to the project has been very small and pretty non-existent for the
last few years - other priorities. So thank you.