[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: [SOLVED, sort of] System unusably slow after Debian upgrade.



Hello again all,

For those of you who haven't been watching, here's the problem, er, in
a nutshell:

After a Debian upgrade in two steps from Jessie to Stretch and then
from Stretch to Buster on an old Intel E3815-powered box, performance
was degraded by at least one, and usually more than two orders of
magnitude (around ten times slower at the command prompt, and hundreds
of times slower using a GUI), effectively making the box unusable.

The absurdly reduced demonstration was, from the bash command line, to
remove a big file from RAM disc - which took nearly ten seconds on the
box with Buster and under a second on an identical box running Jessie.
I'd already removed everyything I could think of that might be causing
the problem (including AppArmor), and I even removed systemd.  No joy.

======================================================================
Thanks Nicholas and Thomas, good calls but (as it turns out) I didn't
need to get on my bike, and it definitely wasn't a network problem.

On Sat, 29 Feb 2020, Nicholas Geovanis wrote:
On Sat, Feb 29, 2020, Thomas Schmitt wrote:
> G.W. Haywood wrote:
> > It's just like the machine is
> > suddenly being powered by an 8080 instead of an E3815...
>
> Lacking any ideas what might be the problem, i'd try Live ISOs of
> Debian 9 and Debian 8 whether their systems show the same problem.

A good recommendation I think.

Not a bad idea, but - given that I know the box was fine for several
years when it was running Jessie - I'm not sure what it would tell me.

The main problem however is that it's remote, so I couldn't boot it
from an ISO without either going over there or calling the people and
trying to explain to them how to do it.  Neither option is attractive,
but if all else failed, booting from an image on USB would definitely
have, er, been on the cards.

On Sat, Feb 29, 2020, Thomas Schmitt wrote:

In my experience these symptoms usually manifest as queries to other DNS
servers which cannot be contacted. So the name query times out at each name
server adding latency to the request. Beware of systemd "augmenting" the
related config files at startup.

True enough, especially that last part, and I've seen enough problems
caused by faulty networking setups, but by getting down to the minimal
test case of removing a file from RAM at the bash prompt I eliminated
network operations.  This isn't a network or DNS or NSS issue.

======================================================================
On Sat, 29 Feb 2020, G.W. Haywood wrote:

I'm now almost convinced it's a kernel-level problem, so next I'll
try booting with an older kernel if Buster will let me.  If anyone
here has run Buster with a 3.x kernel I'd be pleased to hear ...

So I modified the grub configuration to boot the existing 3.16 kernel
of Jessie vintage by default, rebooted, and immediately it was obvious
that the box was back to its normal performance - just logging in was
a lot quicker.  Copying my 3 Gigabyte test file to /dev/shm took only
48 seconds instead of nearly ten minutes, and the time taken to delete
it was similarly improved.  I copied with rsync because that gives me
an idea of how long it's going to take almost immediately.  I can stop
the copy if it's obviously going to be a waste of time; I guess '-avP'
is habitual, my fingers seem to just do that without being told:

----------------------------------------------------------------------
Farm-1:/home# >>> time rsync -avP F-2020.02.26.tgz  /dev/shm
sending incremental file list
F-2020.02.26.tgz
  3,230,393,252 100%   63.54MB/s    0:00:48 (xfr#1, to-chk=0/1)
sent 3,231,182,020 bytes  received 35 bytes  66,622,310.41 bytes/sec
total size is 3,230,393,252  speedup is 1.00
real    0m48.499s
user    0m36.108s
sys     0m12.376s
Farm-1:/home# >>> ls -l /dev/shm/F-2020.02.26.tgz -rw-r--r-- 1 root root 3230393252 Feb 26 14:25 /dev/shm/F-2020.02.26.tgz
Farm-1:~# >>> uname -a
Linux Farm-1 3.16.0-10-amd64 #1 SMP Debian 3.16.81-1 (2020-01-17) x86_64 GNU/Linux
Farm-1:~# >>> time rm /dev/shm/F-2020.02.26.tgz real 0m0.644s
user    0m0.000s
sys     0m0.632s

Running kernel 4.19, the delete operation took just over 9.6 seconds.
----------------------------------------------------------------------

So the problem is eliminated, if not exactly solved.  Solution would
require finding out what's wrong with the combination of this kernel
and the individual box, but I have more pressing issues so that might
never happen.  I don't care what was wrong, as long as the users are
once again happy users.  Well, as happy as they were before. :/

Now I have what I'll euphemistically call more leisure time, I guess
I'll compile a variety of kernels for the box to see what happens.
There are some processor bugs listed in /proc/cpuinfo:

bugs            : cpu_meltdown spectre_v1 spectre_v2 mds msbds_only

but as the SPECTRE etc. mitigations are allegedly not in this kernel
about the only thing I can think of is that it might be something to
do with UEFI.

Thanks, once again, to all who spent any time at all on this topic -
even if you only thought about it and didn't respond - but especially
to all those who came up with ideas and suggestions.  All were useful.
If there's any interest I can follow up with any significant findings
but otherwise I won't spam the list.

--

73,
Ged.


Reply to: