[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#1121006: linux: reported optimal_io_size from mpt3sas devices results in 4GB raid10 optimal_io_size



Source: linux
Version: 6.12.57-1
Severity: important

Dear Maintainer,
At Wikimedia Foundation we are running Trixie debian-installer on Dell r450
hardware with an mpt3sas (HBA355i with id 1000:00e6) controller and SSD
attached. While debian-installer finished successfully, grub was then unable to
boot the installed system.

Partman is instructed to assemble a raid10 over four devices with LVM on top.
Upon inspection the LVM PV is created with ~4GB metadata area which tricks grub
into allocating the same amount of memory during LVM detection. While
grub-install taking ~4GB of RAM "works" during installation, albeit
grub-install being quite slow, it obviously fails when booting.

I tracked down the problem to md0 reporting optimal_io_size of ~4GB, and LVM
defaults to align metadata with said size, resulting in abnormally large
PV metadata area.

The large md0 optimal_io_size seems to come from component devices reporting
16MB optimal_io_size as shown below.

This host was working fine with Bookworm, which makes me think something has
changed in mpt3sas.

My understanding is that the controller queries devices via block limits VPD
page for these values, and I'm attaching the output below. The original task
which spawned this work is https://phabricator.wikimedia.org/T407586

I'm happy to conduct further testing for bug fixes and/or investigation.

best,
Filippo

====

# uname -a
Linux cloudcontrol2010-dev 6.12.57+deb13-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.12.57-1 (2025-11-05) x86_64 GNU/Linux

# lsblk -t
NAME           ALIGNMENT MIN-IO     OPT-IO PHY-SEC LOG-SEC ROTA SCHED       RQ-SIZE      RA WSAME
sda                    0   4096   16773120    4096     512    0 mq-deadline     256   32760    0B
|-sda1                 0   4096   16773120    4096     512    0 mq-deadline     256   32760    0B
|-sda2                 0   4096   16773120    4096     512    0 mq-deadline     256   32760    0B
`-sda3                 0   4096   16773120    4096     512    0 mq-deadline     256   32760    0B
  `-md0                0 524288 4293918720    4096     512    0                     4192256    0B
    |-vg0-swap         0 524288 4293918720    4096     512    0                     4192256    0B
    |-vg0-root         0 524288 4293918720    4096     512    0                     4192256    0B
    `-vg0-srv          0 524288 4293918720    4096     512    0                     4192256    0B
sdb                    0   4096   16773120    4096     512    0 mq-deadline     256   32760    0B
|-sdb1                 0   4096   16773120    4096     512    0 mq-deadline     256   32760    0B
|-sdb2                 0   4096   16773120    4096     512    0 mq-deadline     256   32760    0B
`-sdb3                 0   4096   16773120    4096     512    0 mq-deadline     256   32760    0B
  `-md0                0 524288 4293918720    4096     512    0                     4192256    0B
    |-vg0-swap         0 524288 4293918720    4096     512    0                     4192256    0B
    |-vg0-root         0 524288 4293918720    4096     512    0                     4192256    0B
    `-vg0-srv          0 524288 4293918720    4096     512    0                     4192256    0B
sdc                    0   4096   16773120    4096     512    0 mq-deadline     256   32760    0B
|-sdc1                 0   4096   16773120    4096     512    0 mq-deadline     256   32760    0B
|-sdc2                 0   4096   16773120    4096     512    0 mq-deadline     256   32760    0B
`-sdc3                 0   4096   16773120    4096     512    0 mq-deadline     256   32760    0B
  `-md0                0 524288 4293918720    4096     512    0                     4192256    0B
    |-vg0-swap         0 524288 4293918720    4096     512    0                     4192256    0B
    |-vg0-root         0 524288 4293918720    4096     512    0                     4192256    0B
    `-vg0-srv          0 524288 4293918720    4096     512    0                     4192256    0B
sdd                    0   4096   16773120    4096     512    0 mq-deadline     256   32760    0B
|-sdd1                 0   4096   16773120    4096     512    0 mq-deadline     256   32760    0B
|-sdd2                 0   4096   16773120    4096     512    0 mq-deadline     256   32760    0B
`-sdd3                 0   4096   16773120    4096     512    0 mq-deadline     256   32760    0B
  `-md0                0 524288 4293918720    4096     512    0                     4192256    0B
    |-vg0-swap         0 524288 4293918720    4096     512    0                     4192256    0B
    |-vg0-root         0 524288 4293918720    4096     512    0                     4192256    0B
    `-vg0-srv          0 524288 4293918720    4096     512    0                     4192256    0B


# pvck --dump headers /dev/md0
  label_header at 512
  label_header.id LABELONE
  label_header.sector 1
  label_header.crc 0xbdf3a961
  label_header.offset 32
  label_header.type LVM2 001
  pv_header at 544
  pv_header.pv_uuid KDkSuWsrIico15Y0PenxiLzT8Ad2dGLa
  pv_header.device_size 1919546294272
  pv_header.disk_locn[0] at 584 # location of data area
  pv_header.disk_locn[0].offset 4293918720
  pv_header.disk_locn[0].size 0
  pv_header.disk_locn[1] at 600 # location list end
  pv_header.disk_locn[1].offset 0
  pv_header.disk_locn[1].size 0
  pv_header.disk_locn[2] at 616 # location of metadata area
  pv_header.disk_locn[2].offset 4096
  pv_header.disk_locn[2].size 4293914624
  pv_header.disk_locn[3] at 632 # location list end
  pv_header.disk_locn[3].offset 0
  pv_header.disk_locn[3].size 0
  pv_header_extension at 648
  pv_header_extension.version 2
  pv_header_extension.flags 1
  pv_header_extension.disk_locn[0] at 656 # location list end
  pv_header_extension.disk_locn[0].offset 0
  pv_header_extension.disk_locn[0].size 0
  mda_header_1 at 4096 # metadata area
  mda_header_1.checksum 0x84d8039
  mda_header_1.magic 0x204c564d3220785b35412572304e2a3e
  mda_header_1.version 1
  mda_header_1.start 4096
  mda_header_1.size 4293914624
  mda_header_1.raw_locn[0] at 4136 # commit
  mda_header_1.raw_locn[0].offset 4608
  mda_header_1.raw_locn[0].size 1724
  mda_header_1.raw_locn[0].checksum 0xdd78f68b
  mda_header_1.raw_locn[0].flags 0x0
  mda_header_1.raw_locn[1] at 4160 # precommit
  mda_header_1.raw_locn[1].offset 0
  mda_header_1.raw_locn[1].size 0
  mda_header_1.raw_locn[1].checksum 0x0
  mda_header_1.raw_locn[1].flags 0x0
  metadata text at 8704 crc 0xdd78f68b # vgname vg0 seqno 4

# Devices are all reporting the same information

# sg_vpd -p bl /dev/sda
Block limits VPD page (SBC)
  Write same non-zero (WSNZ): 1
  Maximum compare and write length: 0 blocks [command not implemented]
  Optimal transfer length granularity: 0 blocks [not reported]
  Maximum transfer length: 0 blocks [not reported]
  Optimal transfer length: 0 blocks [not reported]
  Maximum prefetch length: 0 blocks [not reported]
  Maximum unmap LBA count: 0x3ffff
  Maximum unmap block descriptor count: 0x20
  Optimal unmap granularity: 0x1
  Unmap granularity alignment valid: false
  Maximum write same length: 0xffff
  Maximum atomic transfer length: 0 blocks [not reported]
  Atomic alignment: 0 blocks [unaligned atomic writes permitted]
  Atomic transfer length granularity: 0 blocks [no granularity requirement]
  Maximum atomic transfer length with atomic boundary: 0 blocks [not reported]
  Maximum atomic boundary size: 0 blocks [can only write atomic 1 block]

# sg_vpd -p ai /dev/sda
ATA information VPD page:
  SAT Vendor identification: LSI
  SAT Product identification: LSI SATL
  SAT Product revision level: 0008
  Device signature indicates SATA transport
  Command code: 0xec
  ATA command IDENTIFY DEVICE response summary:
    model: MTFDDAK960TGA-1BC1ZABDA
    serial number:         XXX
    firmware revision:  D4DK003

c3:00.0 Serial Attached SCSI controller: Broadcom / LSI Fusion-MPT 12GSAS/PCIe Secure SAS38xx
        Subsystem: Dell HBA355i Front
        Flags: bus master, fast devsel, latency 0, IRQ 16
        Memory at e6800000 (64-bit, prefetchable) [size=1M]
        Memory at e6900000 (64-bit, prefetchable) [size=1M]
        Memory at e6a00000 (32-bit, non-prefetchable) [size=1M]
        I/O ports at e000 [size=256]
        Expansion ROM at <ignored> [disabled]
        Capabilities: [40] Power Management version 3
        Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+
        Capabilities: [70] Express Endpoint, IntMsgNum 0
        Capabilities: [b0] MSI-X: Enable+ Count=128 Masked-
        Capabilities: [100] Advanced Error Reporting
        Capabilities: [148] Power Budgeting <?>
        Capabilities: [158] Alternative Routing-ID Interpretation (ARI)
        Capabilities: [168] Secondary PCI Express
        Capabilities: [188] Physical Layer 16.0 GT/s <?>
        Capabilities: [1b0] Lane Margining at the Receiver
        Capabilities: [218] Dynamic Power Allocation <?>
        Capabilities: [248] Vendor Specific Information: ID=0002 Rev=4 Len=100 <?>
        Capabilities: [348] Vendor Specific Information: ID=0001 Rev=1 Len=038 <?>
        Capabilities: [380] Data Link Feature <?>
        Kernel driver in use: mpt3sas


Reply to: