Bug#1121006: linux: reported optimal_io_size from mpt3sas devices results in 4GB raid10 optimal_io_size

To: Salvatore Bonaccorso <carnil@debian.org>
Cc: 1121006@bugs.debian.org
Subject: Bug#1121006: linux: reported optimal_io_size from mpt3sas devices results in 4GB raid10 optimal_io_size
From: Filippo Giunchedi <filippo@debian.org>
Date: Thu, 20 Nov 2025 14:43:24 +0000
Message-id: <[🔎] aR8pDIXtWf+kPfO9@esaurito.net>
Reply-to: Filippo Giunchedi <filippo@debian.org>, 1121006@bugs.debian.org
In-reply-to: <[🔎] aR33hJMYzJOXUhgp@eldamar.lan>
References: <[🔎] aR3KLd0kR43NeuwT@esaurito.net> <[🔎] aR33hJMYzJOXUhgp@eldamar.lan> <[🔎] aR3KLd0kR43NeuwT@esaurito.net>

Hello Salvatore,
Thank you for the quick reply.

On Wed, Nov 19, 2025 at 05:59:48PM +0100, Salvatore Bonaccorso wrote:
[...]
> >         Capabilities: [348] Vendor Specific Information: ID=0001 Rev=1 Len=038 <?>
> >         Capabilities: [380] Data Link Feature <?>
> >         Kernel driver in use: mpt3sas
> 
> This sounds like quite an intersting finding but probably hard to
> reproduce without the hardware if it comes to be specific to the
> controller type and driver.

That's a great point re: reproducibility, and it got me curious on something I
hadn't thought of testing. Namely if there's another angle to this: does any
block device with the same block I/O hints exhibit the same problem? The answer is
actually "yes".

I used qemu 'scsi-hd' device to set the same values to be able to test locally.
On an already-installed VM I added the following to present four new devices:

-device virtio-scsi-pci,id=scsi0 

-drive file=./workdir/disks/disk3.qcow2,format=qcow2,if=none,id=drive3
-device scsi-hd,bus=scsi0.0,drive=drive3,physical_block_size=4096,logical_block_size=512,min_io_size=4096,opt_io_size=16773120

-drive file=./workdir/disks/disk4.qcow2,format=qcow2,if=none,id=drive4
-device scsi-hd,bus=scsi0.0,drive=drive4,physical_block_size=4096,logical_block_size=512,min_io_size=4096,opt_io_size=16773120

-drive file=./workdir/disks/disk5.qcow2,format=qcow2,if=none,id=drive5
-device scsi-hd,bus=scsi0.0,drive=drive5,physical_block_size=4096,logical_block_size=512,min_io_size=4096,opt_io_size=16773120

-drive file=./workdir/disks/disk6.qcow2,format=qcow2,if=none,id=drive6
-device scsi-hd,bus=scsi0.0,drive=drive6,physical_block_size=4096,logical_block_size=512,min_io_size=4096,opt_io_size=16773120

I used 10G files with 'qemu-img create -f qcow2 <file> 10G' though size doesn't
affect anything in my testing.

Then in the VM:

# cat /sys/block/sd[cdef]/queue/optimal_io_size 
16773120
16773120
16773120
16773120
# mdadm --create /dev/md1 --level 10 --bitmap none --raid-devices 4 /dev/sdc /dev/sdd /dev/sde /dev/sdf
mdadm: Defaulting to version 1.2 metadata
mdadm: array /dev/md1 started.
# cat /sys/block/md1/queue/optimal_io_size 
4293918720

I was able to reproduce the problem with src:linux 6.18~rc6-1~exp1 as well as 6.12.57-1.

Since it is easy to test this way I tried with a few different opt_io_size values and
was able to reproduce only with 16773120 (i.e. 0xFFF000).

> I would like to ask: Do you have the possibility to make an OS
> instalaltion such that you can freely experiment with various kernels
> and then under them assemble the arrays? If so that would be great
> that you could start bisecting the changes to find where find changes.
> 
> I.e. install the OS independtly on the controller, find by bisecting
> Debian versions manually the kernels between bookworm and trixie
> (6.1.y -> 6.12.y to narrow down the upsream range).

Yes I'm able to perform testing on this host, in fact I worked around the
problem for now by disabling LVM's md alignment auto detection and thus we have
an installed system.
For reference that's "devices { data_alignment_detection = 0 }" in lvm's
config.

> Then bisect the ustream changes to find the offending commits. Let me
> know if you need more specific instructions on the idea. 

Having pointers on how the recommended way to build Debian kernels would be of
great help, thank you!

> Additionally it would be interesting to know if the issue persist in
> 6.17.8 or even 6.18~rc6-1~exp1 to be able to clearly indicate upstream
> that the issue persist in upper kernels. 
> 
> Idealy actually this goes asap to upstream once we are more confident
> ont the subsystem to where to report the issue. If we are reasonably
> confident it it mpt3sas specific already then I would say to go
> already to:

Given the qemu-based reproducer above, maybe this issue is actually two bugs:
raid10 as per above, and mpt3sas presenting 0xFFF000 as optimal_io_size. While
the latter might be suspicious maybe it is not wrong per-se though?

best,
Filippo

Reply to:

Follow-Ups:
- Bug#1121006: raid10 and component devices optimal_io_size 0xFFF000 results in array optimal_io_size 0xFFF00000
  - From: Filippo Giunchedi <filippo@debian.org>

References:
- Bug#1121006: linux: reported optimal_io_size from mpt3sas devices results in 4GB raid10 optimal_io_size
  - From: Filippo Giunchedi <filippo@debian.org>
- Bug#1121006: linux: reported optimal_io_size from mpt3sas devices results in 4GB raid10 optimal_io_size
  - From: Salvatore Bonaccorso <carnil@debian.org>

Prev by Date: neat & clean websites at low cost
Next by Date: Processed: tagging 1121006, tagging 1121006
Previous by thread: Bug#1121006: linux: reported optimal_io_size from mpt3sas devices results in 4GB raid10 optimal_io_size
Next by thread: Bug#1121006: raid10 and component devices optimal_io_size 0xFFF000 results in array optimal_io_size 0xFFF00000
Index(es):
- Date
- Thread