[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: System unusably slow after Debian upgrade.



Hi there,

Thanks Greg, Dan and Rico for the replies.  Keep them coming, I'm
afraid we're not out of the woods yet.

On Thu, Feb 27, 2020 at 03:11:10PM +0000, G.W. Haywood wrote:

... Jessie to Stretch to Buster ...  Immediately, the users started to
complain about performance.  Not just a small reduction, but the sort
of thing that makes the whole system completely unusable.  My estimate
after looking at the response on the desktop is several hundred times
slower than normal for this box.  ...

----------------------------------------------------------------------
On Thu, 27 Feb 2020, Greg Wooledge wrote:

Jessie to stretch changed X and video drivers pretty dramatically.
Maybe you just need to load firmware for your video chipset.  And
yes, there are some video chipsets where firmware was *not* used
under jessie, but *is* used under the current release.

But that's just a shot in the dark.

Understood it's a shot in the dark.  I'm pretty sure I'm using the
right chipset firmware.  I think I mentioned in the OP that I purged
xserver-xorg-video-intel package and I have the non-free firmware
installed:

Farm-1:/etc# >>> dpkg --get-selections | grep firmware
firmware-amd-graphics                           install
firmware-linux-free                             install
firmware-linux-nonfree                          install
firmware-misc-nonfree                           install
firmware-realtek                                install

Although the GUI performance does seem to be affected very much more
than the performance I see remotely over the VPN, the machine is now
running headless yet there's still a factor of between five and ten
times reduction in the performance for a simple 'apt-get update' run
in a remote shell, and even just removing a file stored in RAM!  The
two virtually identical machines are on the same LAN (at adjacent IP
addresses) and the 'apt-get update' timings were taken on a second run
to hopefully reduce caching issues.  Removing a file from a RAM disc
is getting close to an absurd reduction yet there's still an order of
magnitude performance difference.  How on Earth can it take almost ten
seconds to remove *anything* from RAM?  Copying the 3 GB file took a
little over a minute on the 'good' machine (spinning disc) and 8m 41s
on the 'problem' box (the one with the SSD).

----------------------------------------------------------------------
On Thu, 27 Feb 2020, Dan Ritter wrote:

Go to /etc/nsswitch.conf

If these lines look like this

passwd:         compat systemd
group:          compat systemd
shadow:         compat systemd

remove the systemd references.

If performance improves immensely immediately after the edit,
that was the problem.

I'm no lover of systemd, but as with the graphics drivers it's hard to
imagine how the Name Service Switch could affect the time taken to
gzip a file.  Nevertheless I gave it a shot.  The 'passwd' and 'group'
entries were as you described but the 'shadow' entry was not.  I've
removed the two 'systemd' refrences, and at least remotely from the
command line it doesn't look like it's helped - see the timings below,
as a test I used one command which does at least perform some network
operations (apt-get update) but the problem persists.

There might well be more than one issue to deal with, so I'll keep the
modified .conf file as it is until I can spend time at the farm office
to try using the machine with a GUI at a local screen.  Unfortunately
I have to work within the limits of their (sometimes very busy) days,
so it might take a while.

----------------------------------------------------------------------
On Thu, 27 Feb 2020, Reco wrote:

Just a wild guess, as I saw something similar after the upgrade to
buster.  Your users did not like new and shiny mq-deadline I/O
scheduler.  Have you tried its alternative, bfq?

Barring that, have you tried lowering vm.dirty_ratio and
vm.background_dirty_ratio sysctls?

No, I haven't done anything with schedulers and no, I haven't tried
messing with the kernel cache tuning - it hasn't changed since Jessie,
save for an additional tunable parameter.  It's an interesting thought
and I thank you for the pointer but in view of other, new measurements
(see below) I think probably not relevant.  To be honest I'm reluctant
to mess with these values on a machine that's a few miles away, but if
there's no progress elsewhere in the meantime when next I can sit by
the side of the box I'll check if any of this has a noticeable effect.

======================================================================

Timings mentioned above:

Here's the problem machine doing an 'apt-get update':
----------------------------------------------------------------------
Farm-1:/etc# >>> time apt-get update
Hit:1 http://deb.debian.org/debian buster InRelease
Hit:2 http://security.debian.org buster/updates InRelease Reading package lists... Done real 1m57.904s
user    1m37.491s
sys     0m6.340s

Here's the machine I haven't (for want of a better word) upgraded:
----------------------------------------------------------------------
Farm-2:/etc# >>> time apt-get update
Ign http://deb.debian.org jessie InRelease
[20 Hits snipped]
Hit http://security.debian.org jessie/updates/non-free Translation-en
Reading package lists... Done real 0m16.548s
user    0m13.148s
sys     0m1.480s

Here's the 'problem' machine copying a 3GB file, SSD to RAMdisk:
----------------------------------------------------------------------
Farm-1:/home# >>> time rsync -avP F-2020.02.26.tgz  /dev/shm
sending incremental file list
F-2020.02.26.tgz
  3,230,393,252 100%    5.90MB/s    0:08:41 (xfr#1, to-chk=0/1)
sent 3,231,182,020 bytes  received 35 bytes  6,184,080.49 bytes/sec
total size is 3,230,393,252  speedup is 1.00
real    8m41.852s
user    5m1.431s
sys     1m37.735s

The 'good' machine (copying from a spinning disc!):
----------------------------------------------------------------------
Farm-2:/home/ged# >>> time rsync -avP F-2020.02.26.tgz /dev/shm
sending incremental file list
F-2020.02.26.tgz
  3,230,393,252 100%   40.10MB/s    0:01:16 (xfr#1, to-chk=0/1)
sent 3,231,182,030 bytes  received 35 bytes  41,692,671.81 bytes/sec
total size is 3,230,393,252  speedup is 1.00
real    1m16.913s
user    0m37.324s
sys     0m13.300s

The good machine removing the same file:
----------------------------------------------------------------------
Farm-2:/home# >>> time rm /dev/shm/F-2020.02.26.tgz real 0m0.847s
user    0m0.000s
sys     0m0.444s

The problem machine, the same file, the same type and quantity of RAM:
----------------------------------------------------------------------
Farm-1:/home# >>> time rm /dev/shm/F-2020.02.26.tgz real 0m9.637s
user    0m0.006s
sys     0m3.670s

The timing differences look more like what I described in the OP for
gzip performance rather than the much poorer performance of the GUI-
driven software experienced by the users (about an order of magnitude
rather than two and a half) but that aside it doesn't look so far like
things are back to normal.

I'm now almost convinced it's a kernel-level problem, so next I'll try
booting with an older kernel if Buster will let me.  If anyone here has
run Buster with a 3.x kernel I'd be pleased to hear from you.

Thanks all once again, it's been very useful, and please keep those
suggestions coming.  I'll get back to the list with any developments.

--

73,
Ged.


Reply to: