[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#901941: marked as done (IO performance hit on kernel 4.9)



Your message dated Sun, 23 May 2021 12:34:05 -0700 (PDT)
with message-id <60aaae2d.1c69fb81.f8455.9085@mx.google.com>
and subject line Closing this bug (BTS maintenance for src:linux bugs)
has caused the Debian Bug report #901941,
regarding IO performance hit on kernel 4.9
to be marked as done.

This means that you claim that the problem has been dealt with.
If this is not the case it is now your responsibility to reopen the
Bug report if necessary, and/or fix the problem forthwith.

(NB: If you are a system administrator and have no idea what this
message is talking about, this may indicate a serious mail system
misconfiguration somewhere. Please contact owner@bugs.debian.org
immediately.)


-- 
901941: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=901941
Debian Bug Tracking System
Contact owner@bugs.debian.org with problems
--- Begin Message ---
Package: linux-image-4.9.0-0.bpo.6-amd64
Version: 4.9.88-1+deb9u1~bpo8+1

We use debian as our system of choice for running our proprietary software. This application
is used to store large ammounts of data. Files can range from 1-30 megabytes in size and
on demand be served over network to other services. Data is stored on 81 disks
formated to XFS filesystem.
During peak hours the overall bandwidth used on one server can reach up to 15Gbps.

We would like to upgrade our systems from jessie to stretch and we tried
to install new kernel 4.9 from jessie-backports repository. This resulted in
a huge performance drop for us.

At the time of the upgrade the overall bandwidth used was around 8Gbps,
15 min system load around 11, average cpu utilization at 60%, 27% in iowait,
22% in system. Response times of our application were below 90 miliseconds in 90 percentile.

Being aware of Spectre, Meltdown i booted into kernel 4.9 with the option pti=off.

While the bandwidth stayed at around 8Gbps, system metrics got significantly worse.
System 15 min load stabilized around 20, but 1min system load was jumping between 20
and 30. Average cpu utilization has also increased, it was jumping between 60% and 70%,
seems that most of it was caused by system. Our response time metric changed
to 200 in 90 percentile.

Curiosly, disk read bytes increased by about 20%, but that might have been caused by clearing
of filesystem cache by the reboot.

Now some information about the hardware:
CPU: 2x AMD Opteron(tm) Processor 6128
RAM: 256GB

There are 81 data disks connected over 4 SCSI controllers:
LSI Logic / Symbios Logic SAS2008 PCI-Express Fusion-MPT SAS-2 [Falcon]
There is a JBOD system, with 45 disks and 36 are on the server itself,
the JBOD sysstem is connected by external SAS2 cables.

I am unable to find out the exact type of the controllers, but I might
be able to request it if required.

I tried to check cpufreq if there might be something throttling the system,
but it seemed that all cpus are set to performance governor and at max frequency.

The installed system is Debian Jessie 8.10 with libc6 2.19-18+deb8u10

Thank you for any help you can provide.

--- End Message ---
--- Begin Message ---
Hi

This bug was filed for a very old kernel or the bug is old itself
without resolution.

If you can reproduce it with

- the current version in unstable/testing
- the latest kernel from backports

please reopen the bug, see https://www.debian.org/Bugs/server-control
for details.

Regards,
Salvatore

--- End Message ---

Reply to: