Re: Increased read IO wait times after Bullseye upgrade

On Fri, Nov 11, 2022, 1:58 AM Vukovics Mihály <vm@informatik.hu> wrote:

Hi Gareth,

I have already tried to change the queue depth for the physichal disks
but that has almost no effect.
There is almost no load on the filesystem, here is 10s sample from atop.
1-2 write requests but 30-50ms of average io.

DSK |          sdc | busy 27% | read       0 | write      2 |
KiB/r      0 | KiB/w      0 |               | MBr/s    0.0 | MBw/s
0.0 | avq     1.83 | avio 38.0 ms |
DSK |          sdb | busy     18% | read       0 | write 1 |
KiB/r      0 | KiB/w      1 |               | MBr/s 0.0 | MBw/s
0.0 | avq     1.63 | avio 52.0 ms |
DSK |          sde | busy     18% | read       0 | write 1 |
KiB/r      0 | KiB/w      1 |               | MBr/s 0.0 | MBw/s
0.0 | avq     1.63 | avio 52.0 ms |
DSK |          sda | busy     17% | read       0 | write 1 |
KiB/r      0 | KiB/w      1 |               | MBr/s 0.0 | MBw/s
0.0 | avq     1.60 | avio 48.0 ms |

Those numbers for percentage busy seem very high to me for such a low rate of IO initiation. Either the blocks being moved are very large (not necessarily wrong, maybe just poorly configured for the load) or there are other things going on with the drives.

Are the physical drives shared with any other systems? Are multiple VMs of whatever type running on the same hardware host?

Another possibility: the drives and or filesystems are thrashing as they respond to hardware and/or filesystem problems. Anything interesting there in dmsg or logs?

On 2022. 11. 10. 14:32, Gareth Evans wrote:
> On Thu 10 Nov 2022, at 11:36, Gareth Evans <donotspam@fastmail.fm> wrote:
> [...]
>> This assumes the identification of the driver in [3] (below) is
>> anything to go by.
> I meant [1] not [3].
>
> Also potentially of interest:
>
> "Queue depth
>
> The queue depth is a number between 1 and ~128 that shows how many I/O requests are queued (in-flight) on average. Having a queue is beneficial as the requests in the queue can be submitted to the storage subsystem in an optimised manner and often in parallel. A queue improves performance at the cost of latency.
>
> If you have some kind of storage performance monitoring solution in place, a high queue depth could be an indication that the storage subsystem cannot handle the workload. You may also observe higher than normal latency figures. As long as latency figures are still within tolerable limits, there may be no problem."
>
> https://louwrentius.com/understanding-storage-performance-iops-and-latency.html
>
> See
>
> $ cat /sys/block/sdX/device/queue_depth
>
--
--
Köszönettel:
Vukovics Mihály