Re: New Debian sparc64 test kernel for stack corruption issue
Hello Riccardo,
On Mon, 2025-10-20 at 20:05 +0200, Riccardo Mottola wrote:
> https://people.debian.org/~glaubitz/sparc64/linux-image-6.16.12+3-sparc64-smp_6.16.12-2+sparc64.1_sparc64.deb
>
> On T2000 with Niagara
>
> [ 12.126130] mptsas 0000:07:00.0: Unable to change power state from
> D3cold to
> D0, device inaccessible
> [ 12.463473] NON-RESUMABLE ERROR: Reporting on cpu 31
> [ 12.463643] NON-RESUMABLE ERROR: TPC [0x0000000010184034]
> <MakeIocReady+0x10/
> 0x298 [mptbase]>
> [ 12.463810] NON-RESUMABLE ERROR: RAW
> [1f10000000000007:0000000e3179235c:00000
> 00202000004:000000ea00300000
> [ 12.463894] NON-RESUMABLE ERROR: 00000000001f0000:0000000000000000:00000
> 00000000000:0000000000000000]
> [ 12.463975] NON-RESUMABLE ERROR: handle [0x1f10000000000007] stick
> [0x0000000
> e3179235c]
> [ 12.464050] NON-RESUMABLE ERROR: type [precise nonresumable]
> [ 12.464113] NON-RESUMABLE ERROR: attrs [0x02000004] < PIO sp-faulted
> priv >
> [ 12.464221] NON-RESUMABLE ERROR: raddr [0x000000ea00300000]
> [ 12.465352] Kernel panic - not syncing: Non-resumable error.
> [ 12.465422] CPU: 31 UID: 0 PID: 367 Comm: (udev-worker) Not tainted
> 6.16.12+3
> -sparc64-smp #1 NONE Debian 6.16.12-2+sparc64.1
> [ 12.465532] Call Trace:
> [ 12.465574] [<00000000004373c4>] dump_stack+0x8/0x18
> [ 12.465656] [<0000000000429540>] panic+0xf4/0x398
> [ 12.465727] [<000000000043afcc>] sun4v_nonresum_error+0x16c/0x240
> [ 12.465817] [<0000000000406eb8>] sun4v_nonres_mondo+0xc8/0xd8
> [ 12.465910] [<0000000010184034>] MakeIocReady+0x10/0x298 [mptbase]
> [ 12.466007] [<00000000101844b4>] mpt_do_ioc_recovery+0x9c/0x1110
> [mptbase]
> [ 12.466103] [<00000000101836f8>] mpt_attach+0xb58/0xd20 [mptbase]
> [ 12.466209] [<0000000010287f30>] mptsas_probe+0x10/0x440 [mptsas]
> [ 12.466336] [<0000000000b3fab0>] local_pci_probe+0x30/0x80
> [ 12.466427] [<0000000000b405d4>] pci_device_probe+0xb4/0x240
> [ 12.466518] [<0000000000bfd348>] really_probe+0xc8/0x400
> [ 12.466612] [<0000000000bfd70c>] __driver_probe_device+0x8c/0x160
> [ 12.466709] [<0000000000bfd8c8>] driver_probe_device+0x28/0x100
> [ 12.466805] [<0000000000bfdb7c>] __driver_attach+0xbc/0x1e0
> [ 12.466900] [<0000000000bfacfc>] bus_for_each_dev+0x5c/0xc0
> [ 12.466992] [<0000000000bfcafc>] driver_attach+0x1c/0x40
> [ 13.088506] Press Stop-A (L1-A) from sun keyboard or send break
> [ 13.088506] twice on console to return to the boot prom
> [ 13.088811] ---[ end Kernel panic - not syncing: Non-resumable error.
> ]---
>
> still crash, but there is some "information"
I have not seen that crash on any of my machines, so I'm really wondering where it comes from.
> Take in consideration that on this sytem my sweet spot for kernel is:
> 6.12.38+deb13-sparc64-smp
Would you be able to bisect this?
> older stuff was crashy, IIRC all newer kernel (or most?) straigt from
> debian and most kernel you provided fail to boot.
>
> Instead on single-CPU I tested:
>
> https://people.debian.org/~glaubitz/sparc64/linux-image-6.16.12+3-sparc64_6.16.12-2+sparc64.1_sparc64.deb
>
> On Netra T1 UltraSPARC IIe
>
> - boots fine
> - basic operations like apt-get work fine
> - some compilation - survives fine
>
> In dmesg I see no failures, errors... except this:
>
> [ 0.952369] pci_bus 0000:01: extended config space not accessible
> [ 0.957507] pci_bus 0000:02: extended config space not accessible
> [ 10.402607] This architecture does not have kernel memory protection.
> [ 21.640233] Warning! ehci_hcd should always be loaded before uhci_hcd
> and ohci_hcd, not after
> [ 33.087538] PM: Image not found (code -22)
> [ 35.720238] Not activating Mandatory Access Control as
> /sbin/tomoyo-init does not exist.
Out of curiosity, can you try reinstalling the openssh-server package?
# apt install --reinstall openssh-server
This causes crashes for me with certain kernels on UltraSPARC III.
> Then I tried the said kernel on Ultra1 with UltraSPARC I (literally
> transferring the same Hard Disk)
> As "expected" there were no improvements over other kernels. same boot
> failure:
> Invalid sbus slot number 31
> Invalid sbus slot number 31
> error: canonicalise devname failed.
> Can't read disk label.
> Can't open disk label package
> error: unable to open /sbus@1f,0/SUNW,fdtwo@f,1400000.
> Invalid SCSI target number fffe55d0
> error: unable to open /sbus@1f,0/SUNW,fas@e,8800000/sd.
>
> This is really the same I even got with 6.1 kernels. I was not able to
> boot this system into debian at all.
We can look into this later, this is not a kernel but a bootloader problem.
Adrian
--
.''`. John Paul Adrian Glaubitz
: :' : Debian Developer
`. `' Physicist
`- GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913
Reply to: