Re: New test kernel - second attempt
Hi,
On Wed, 2025-09-17 at 15:38 +0200, Riccardo Mottola wrote:
> Still fails, but there is more error reporting:
> [ 9.358051] mptsas 0000:07:00.0: Unable to change power state from
> D3cold to D0, device inaccessible
> [ 9.695479] NON-RESUMABLE ERROR: Reporting on cpu 24
> [ 9.695655] NON-RESUMABLE ERROR: TPC [0x0000000010168064]
> <MakeIocReady+0x10/0x294 [mptbase]>
> [ 9.695821] NON-RESUMABLE ERROR: RAW
> [1810000000000007:0000000c3414f764:0000000202000004:000000ea00300000
> [ 9.695913] NON-RESUMABLE ERROR:
> 0000000000180000:0000000000000000:0000000000000000:0000000000000000]
> [ 9.695998] NON-RESUMABLE ERROR: handle [0x1810000000000007] stick
> [0x0000000c3414f764]
> [ 9.696071] NON-RESUMABLE ERROR: type [precise nonresumable]
> [ 9.696133] NON-RESUMABLE ERROR: attrs [0x02000004] < PIO sp-faulted
> priv >
> [ 9.696238] NON-RESUMABLE ERROR: raddr [0x000000ea00300000]
> [ 9.697323] Kernel panic - not syncing: Non-resumable error.
> [ 9.697391] CPU: 24 UID: 0 PID: 296 Comm: (udev-worker) Not tainted
> 6.17.0-rc5+ #1 NONE
> [ 9.697476] Call Trace:
> [ 9.697517] [<0000000000436b54>] dump_stack+0x8/0x18
> [ 9.697595] [<00000000004294c4>] vpanic+0xdc/0x310
> [ 9.697663] [<000000000042971c>] panic+0x24/0x30
> [ 9.697726] [<000000000043aea0>] sun4v_nonresum_error+0x140/0x200
> [ 9.697812] [<0000000000406eb8>] sun4v_nonres_mondo+0xc8/0xd8
> [ 9.697901] [<0000000010168064>] MakeIocReady+0x10/0x294 [mptbase]
> [ 9.697995] [<00000000101684e0>] mpt_do_ioc_recovery+0xa0/0x11b4
> [mptbase]
> [ 9.698101] [<0000000010167748>] mpt_attach+0xae8/0xca0 [mptbase]
> [ 9.698204] [<00000000101bc010>] mptsas_probe+0x10/0x440 [mptsas]
> [ 9.698333] [<0000000000b380d4>] local_pci_probe+0x34/0x80
> [ 9.698433] [<0000000000b39094>] pci_device_probe+0xb4/0x200
> [ 9.698520] [<0000000000c05e48>] really_probe+0xc8/0x420
> [ 9.698612] [<0000000000c0622c>] __driver_probe_device+0x8c/0x160
> [ 9.698706] [<0000000000c063e8>] driver_probe_device+0x28/0xe0
> [ 9.698799] [<0000000000c066c4>] __driver_attach+0xe4/0x1e0
> [ 9.698891] [<0000000000c03714>] bus_for_each_dev+0x54/0xc0
> [ 10.320382] Press Stop-A (L1-A) from sun keyboard or send break
> [ 10.320382] twice on console to return to the boot prom
> [ 10.320546] ---[ end Kernel panic - not syncing: Non-resumable error.
> ]---
>
>
> I retried and had the quickness to clear console screen to be sure of
> message and see this:
>
> Loading initial ramdisk .....
>
> [ 9.337432] mptsas 0000:07:00.0: Unable to change power state from
> D3cold to D0, device inaccessible
> [ 9.674834] NON-RESUMABLE ERROR: Reporting on cpu 11
> [ 9.675005] NON-RESUMABLE ERROR: TPC [0x0000000010102064]
> <MakeIocReady+0x10/0x294 [mptbase]>
> [ 9.675166] NON-RESUMABLE ERROR: RAW
> [0b10000000000007:0000000cde51497c:0000000202000004:000000ea00300000
> [ 9.675248] NON-RESUMABLE ERROR:
> 00000000000b0000:0000000000000000:0000000000000000:0000000000000000]
> [ 9.675327] NON-RESUMABLE ERROR: handle [0x0b10000000000007] stick
> [0x0000000cde51497c]
> [ 9.675399] NON-RESUMABLE ERROR: type [precise nonresumable]
> [ 9.675461] NON-RESUMABLE ERROR: attrs [0x02000004] < PIO sp-faulted
> priv >
> [ 9.675565] NON-RESUMABLE ERROR: raddr [0x000000ea00300000]
> [ 9.676654] Kernel panic - not syncing: Non-resumable error.
> [ 9.676722] CPU: 11 UID: 0 PID: 305 Comm: (udev-worker) Not tainted
> 6.17.0-rc5+ #1 NONE
> [ 9.676808] Call Trace:
> [ 9.676849] [<0000000000436b54>] dump_stack+0x8/0x18
> [ 9.676926] [<00000000004294c4>] vpanic+0xdc/0x310
> [ 9.676994] [<000000000042971c>] panic+0x24/0x30
> [ 9.677057] [<000000000043aea0>] sun4v_nonresum_error+0x140/0x200
> [ 9.677142] [<0000000000406eb8>] sun4v_nonres_mondo+0xc8/0xd8
> [ 9.677232] [<0000000010102064>] MakeIocReady+0x10/0x294 [mptbase]
> [ 9.677325] [<00000000101024e0>] mpt_do_ioc_recovery+0xa0/0x11b4
> [mptbase]
> [ 9.677431] [<0000000010101748>] mpt_attach+0xae8/0xca0 [mptbase]
> [ 9.677534] [<000000001019c010>] mptsas_probe+0x10/0x440 [mptsas]
> [ 9.677664] [<0000000000b380d4>] local_pci_probe+0x34/0x80
> [ 9.677765] [<0000000000b39094>] pci_device_probe+0xb4/0x200
> [ 9.677851] [<0000000000c05e48>] really_probe+0xc8/0x420
> [ 9.677943] [<0000000000c0622c>] __driver_probe_device+0x8c/0x160
> [ 9.678036] [<0000000000c063e8>] driver_probe_device+0x28/0xe0
> [ 9.678129] [<0000000000c066c4>] __driver_attach+0xe4/0x1e0
> [ 9.678221] [<0000000000c03714>] bus_for_each_dev+0x54/0xc0
> [ 10.299883] Press Stop-A (L1-A) from sun keyboard or send break
> [ 10.299883] twice on console to return to the boot prom
> [ 10.300200] ---[ end Kernel panic - not syncing: Non-resumable error.
> ]---
>
>
> so still random CPU/cores. Does this help, though?
Looks like your machine has problems with power management.
Could be issues with the hardware. I would give Solaris 11.3 a try to verify
that the hardware is actually working properly.
Adrian
--
.''`. John Paul Adrian Glaubitz
: :' : Debian Developer
`. `' Physicist
`- GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913
Reply to: