Bug#1121006: linux: reported optimal_io_size from mpt3sas devices results in 4GB raid10 optimal_io_size
Source: linux
Version: 6.12.57-1
Severity: important
Dear Maintainer,
At Wikimedia Foundation we are running Trixie debian-installer on Dell r450
hardware with an mpt3sas (HBA355i with id 1000:00e6) controller and SSD
attached. While debian-installer finished successfully, grub was then unable to
boot the installed system.
Partman is instructed to assemble a raid10 over four devices with LVM on top.
Upon inspection the LVM PV is created with ~4GB metadata area which tricks grub
into allocating the same amount of memory during LVM detection. While
grub-install taking ~4GB of RAM "works" during installation, albeit
grub-install being quite slow, it obviously fails when booting.
I tracked down the problem to md0 reporting optimal_io_size of ~4GB, and LVM
defaults to align metadata with said size, resulting in abnormally large
PV metadata area.
The large md0 optimal_io_size seems to come from component devices reporting
16MB optimal_io_size as shown below.
This host was working fine with Bookworm, which makes me think something has
changed in mpt3sas.
My understanding is that the controller queries devices via block limits VPD
page for these values, and I'm attaching the output below. The original task
which spawned this work is https://phabricator.wikimedia.org/T407586
I'm happy to conduct further testing for bug fixes and/or investigation.
best,
Filippo
====
# uname -a
Linux cloudcontrol2010-dev 6.12.57+deb13-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.12.57-1 (2025-11-05) x86_64 GNU/Linux
# lsblk -t
NAME ALIGNMENT MIN-IO OPT-IO PHY-SEC LOG-SEC ROTA SCHED RQ-SIZE RA WSAME
sda 0 4096 16773120 4096 512 0 mq-deadline 256 32760 0B
|-sda1 0 4096 16773120 4096 512 0 mq-deadline 256 32760 0B
|-sda2 0 4096 16773120 4096 512 0 mq-deadline 256 32760 0B
`-sda3 0 4096 16773120 4096 512 0 mq-deadline 256 32760 0B
`-md0 0 524288 4293918720 4096 512 0 4192256 0B
|-vg0-swap 0 524288 4293918720 4096 512 0 4192256 0B
|-vg0-root 0 524288 4293918720 4096 512 0 4192256 0B
`-vg0-srv 0 524288 4293918720 4096 512 0 4192256 0B
sdb 0 4096 16773120 4096 512 0 mq-deadline 256 32760 0B
|-sdb1 0 4096 16773120 4096 512 0 mq-deadline 256 32760 0B
|-sdb2 0 4096 16773120 4096 512 0 mq-deadline 256 32760 0B
`-sdb3 0 4096 16773120 4096 512 0 mq-deadline 256 32760 0B
`-md0 0 524288 4293918720 4096 512 0 4192256 0B
|-vg0-swap 0 524288 4293918720 4096 512 0 4192256 0B
|-vg0-root 0 524288 4293918720 4096 512 0 4192256 0B
`-vg0-srv 0 524288 4293918720 4096 512 0 4192256 0B
sdc 0 4096 16773120 4096 512 0 mq-deadline 256 32760 0B
|-sdc1 0 4096 16773120 4096 512 0 mq-deadline 256 32760 0B
|-sdc2 0 4096 16773120 4096 512 0 mq-deadline 256 32760 0B
`-sdc3 0 4096 16773120 4096 512 0 mq-deadline 256 32760 0B
`-md0 0 524288 4293918720 4096 512 0 4192256 0B
|-vg0-swap 0 524288 4293918720 4096 512 0 4192256 0B
|-vg0-root 0 524288 4293918720 4096 512 0 4192256 0B
`-vg0-srv 0 524288 4293918720 4096 512 0 4192256 0B
sdd 0 4096 16773120 4096 512 0 mq-deadline 256 32760 0B
|-sdd1 0 4096 16773120 4096 512 0 mq-deadline 256 32760 0B
|-sdd2 0 4096 16773120 4096 512 0 mq-deadline 256 32760 0B
`-sdd3 0 4096 16773120 4096 512 0 mq-deadline 256 32760 0B
`-md0 0 524288 4293918720 4096 512 0 4192256 0B
|-vg0-swap 0 524288 4293918720 4096 512 0 4192256 0B
|-vg0-root 0 524288 4293918720 4096 512 0 4192256 0B
`-vg0-srv 0 524288 4293918720 4096 512 0 4192256 0B
# pvck --dump headers /dev/md0
label_header at 512
label_header.id LABELONE
label_header.sector 1
label_header.crc 0xbdf3a961
label_header.offset 32
label_header.type LVM2 001
pv_header at 544
pv_header.pv_uuid KDkSuWsrIico15Y0PenxiLzT8Ad2dGLa
pv_header.device_size 1919546294272
pv_header.disk_locn[0] at 584 # location of data area
pv_header.disk_locn[0].offset 4293918720
pv_header.disk_locn[0].size 0
pv_header.disk_locn[1] at 600 # location list end
pv_header.disk_locn[1].offset 0
pv_header.disk_locn[1].size 0
pv_header.disk_locn[2] at 616 # location of metadata area
pv_header.disk_locn[2].offset 4096
pv_header.disk_locn[2].size 4293914624
pv_header.disk_locn[3] at 632 # location list end
pv_header.disk_locn[3].offset 0
pv_header.disk_locn[3].size 0
pv_header_extension at 648
pv_header_extension.version 2
pv_header_extension.flags 1
pv_header_extension.disk_locn[0] at 656 # location list end
pv_header_extension.disk_locn[0].offset 0
pv_header_extension.disk_locn[0].size 0
mda_header_1 at 4096 # metadata area
mda_header_1.checksum 0x84d8039
mda_header_1.magic 0x204c564d3220785b35412572304e2a3e
mda_header_1.version 1
mda_header_1.start 4096
mda_header_1.size 4293914624
mda_header_1.raw_locn[0] at 4136 # commit
mda_header_1.raw_locn[0].offset 4608
mda_header_1.raw_locn[0].size 1724
mda_header_1.raw_locn[0].checksum 0xdd78f68b
mda_header_1.raw_locn[0].flags 0x0
mda_header_1.raw_locn[1] at 4160 # precommit
mda_header_1.raw_locn[1].offset 0
mda_header_1.raw_locn[1].size 0
mda_header_1.raw_locn[1].checksum 0x0
mda_header_1.raw_locn[1].flags 0x0
metadata text at 8704 crc 0xdd78f68b # vgname vg0 seqno 4
# Devices are all reporting the same information
# sg_vpd -p bl /dev/sda
Block limits VPD page (SBC)
Write same non-zero (WSNZ): 1
Maximum compare and write length: 0 blocks [command not implemented]
Optimal transfer length granularity: 0 blocks [not reported]
Maximum transfer length: 0 blocks [not reported]
Optimal transfer length: 0 blocks [not reported]
Maximum prefetch length: 0 blocks [not reported]
Maximum unmap LBA count: 0x3ffff
Maximum unmap block descriptor count: 0x20
Optimal unmap granularity: 0x1
Unmap granularity alignment valid: false
Maximum write same length: 0xffff
Maximum atomic transfer length: 0 blocks [not reported]
Atomic alignment: 0 blocks [unaligned atomic writes permitted]
Atomic transfer length granularity: 0 blocks [no granularity requirement]
Maximum atomic transfer length with atomic boundary: 0 blocks [not reported]
Maximum atomic boundary size: 0 blocks [can only write atomic 1 block]
# sg_vpd -p ai /dev/sda
ATA information VPD page:
SAT Vendor identification: LSI
SAT Product identification: LSI SATL
SAT Product revision level: 0008
Device signature indicates SATA transport
Command code: 0xec
ATA command IDENTIFY DEVICE response summary:
model: MTFDDAK960TGA-1BC1ZABDA
serial number: XXX
firmware revision: D4DK003
c3:00.0 Serial Attached SCSI controller: Broadcom / LSI Fusion-MPT 12GSAS/PCIe Secure SAS38xx
Subsystem: Dell HBA355i Front
Flags: bus master, fast devsel, latency 0, IRQ 16
Memory at e6800000 (64-bit, prefetchable) [size=1M]
Memory at e6900000 (64-bit, prefetchable) [size=1M]
Memory at e6a00000 (32-bit, non-prefetchable) [size=1M]
I/O ports at e000 [size=256]
Expansion ROM at <ignored> [disabled]
Capabilities: [40] Power Management version 3
Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+
Capabilities: [70] Express Endpoint, IntMsgNum 0
Capabilities: [b0] MSI-X: Enable+ Count=128 Masked-
Capabilities: [100] Advanced Error Reporting
Capabilities: [148] Power Budgeting <?>
Capabilities: [158] Alternative Routing-ID Interpretation (ARI)
Capabilities: [168] Secondary PCI Express
Capabilities: [188] Physical Layer 16.0 GT/s <?>
Capabilities: [1b0] Lane Margining at the Receiver
Capabilities: [218] Dynamic Power Allocation <?>
Capabilities: [248] Vendor Specific Information: ID=0002 Rev=4 Len=100 <?>
Capabilities: [348] Vendor Specific Information: ID=0001 Rev=1 Len=038 <?>
Capabilities: [380] Data Link Feature <?>
Kernel driver in use: mpt3sas
Reply to: