[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: New Debian sparc64 test kernel for stack corruption issue



Hello Riccardo,

On Mon, 2025-10-20 at 20:05 +0200, Riccardo Mottola wrote:
> https://people.debian.org/~glaubitz/sparc64/linux-image-6.16.12+3-sparc64-smp_6.16.12-2+sparc64.1_sparc64.deb
> 
> On T2000 with Niagara
> 
> [   12.126130] mptsas 0000:07:00.0: Unable to change power state from 
> D3cold to
> D0, device inaccessible
> [   12.463473] NON-RESUMABLE ERROR: Reporting on cpu 31
> [   12.463643] NON-RESUMABLE ERROR: TPC [0x0000000010184034] 
> <MakeIocReady+0x10/
> 0x298 [mptbase]>
> [   12.463810] NON-RESUMABLE ERROR: RAW 
> [1f10000000000007:0000000e3179235c:00000
> 00202000004:000000ea00300000
> [   12.463894] NON-RESUMABLE ERROR: 00000000001f0000:0000000000000000:00000
> 00000000000:0000000000000000]
> [   12.463975] NON-RESUMABLE ERROR: handle [0x1f10000000000007] stick 
> [0x0000000
> e3179235c]
> [   12.464050] NON-RESUMABLE ERROR: type [precise nonresumable]
> [   12.464113] NON-RESUMABLE ERROR: attrs [0x02000004] < PIO sp-faulted 
> priv >
> [   12.464221] NON-RESUMABLE ERROR: raddr [0x000000ea00300000]
> [   12.465352] Kernel panic - not syncing: Non-resumable error.
> [   12.465422] CPU: 31 UID: 0 PID: 367 Comm: (udev-worker) Not tainted 
> 6.16.12+3
> -sparc64-smp #1 NONE  Debian 6.16.12-2+sparc64.1
> [   12.465532] Call Trace:
> [   12.465574] [<00000000004373c4>] dump_stack+0x8/0x18
> [   12.465656] [<0000000000429540>] panic+0xf4/0x398
> [   12.465727] [<000000000043afcc>] sun4v_nonresum_error+0x16c/0x240
> [   12.465817] [<0000000000406eb8>] sun4v_nonres_mondo+0xc8/0xd8
> [   12.465910] [<0000000010184034>] MakeIocReady+0x10/0x298 [mptbase]
> [   12.466007] [<00000000101844b4>] mpt_do_ioc_recovery+0x9c/0x1110 
> [mptbase]
> [   12.466103] [<00000000101836f8>] mpt_attach+0xb58/0xd20 [mptbase]
> [   12.466209] [<0000000010287f30>] mptsas_probe+0x10/0x440 [mptsas]
> [   12.466336] [<0000000000b3fab0>] local_pci_probe+0x30/0x80
> [   12.466427] [<0000000000b405d4>] pci_device_probe+0xb4/0x240
> [   12.466518] [<0000000000bfd348>] really_probe+0xc8/0x400
> [   12.466612] [<0000000000bfd70c>] __driver_probe_device+0x8c/0x160
> [   12.466709] [<0000000000bfd8c8>] driver_probe_device+0x28/0x100
> [   12.466805] [<0000000000bfdb7c>] __driver_attach+0xbc/0x1e0
> [   12.466900] [<0000000000bfacfc>] bus_for_each_dev+0x5c/0xc0
> [   12.466992] [<0000000000bfcafc>] driver_attach+0x1c/0x40
> [   13.088506] Press Stop-A (L1-A) from sun keyboard or send break
> [   13.088506] twice on console to return to the boot prom
> [   13.088811] ---[ end Kernel panic - not syncing: Non-resumable error. 
> ]---
> 
> still crash, but there is some "information"

I have not seen that crash on any of my machines, so I'm really wondering where it comes from.

> Take in consideration that on this sytem my sweet spot for kernel is:
> 6.12.38+deb13-sparc64-smp

Would you be able to bisect this?

> older stuff was crashy, IIRC all newer kernel (or most?) straigt from 
> debian and most kernel you provided fail to boot.
> 
> Instead on single-CPU I tested:
> 
> https://people.debian.org/~glaubitz/sparc64/linux-image-6.16.12+3-sparc64_6.16.12-2+sparc64.1_sparc64.deb
> 
> On Netra T1 UltraSPARC IIe
> 
> - boots fine
> - basic operations like apt-get work fine
> - some compilation - survives fine
> 
> In dmesg I see no failures, errors... except this:
> 
> [    0.952369] pci_bus 0000:01: extended config space not accessible
> [    0.957507] pci_bus 0000:02: extended config space not accessible
> [   10.402607] This architecture does not have kernel memory protection.
> [   21.640233] Warning! ehci_hcd should always be loaded before uhci_hcd 
> and ohci_hcd, not after
> [   33.087538] PM: Image not found (code -22)
> [   35.720238] Not activating Mandatory Access Control as 
> /sbin/tomoyo-init does not exist.

Out of curiosity, can you try reinstalling the openssh-server package?

# apt install --reinstall openssh-server

This causes crashes for me with certain kernels on UltraSPARC III.

> Then I tried the said kernel on Ultra1 with UltraSPARC I (literally 
> transferring the same Hard Disk)
> As "expected" there were no improvements over other kernels. same boot 
> failure:
> Invalid sbus slot number 31
> Invalid sbus slot number 31
> error: canonicalise devname failed.
> Can't read disk label.
> Can't open disk label package
> error: unable to open /sbus@1f,0/SUNW,fdtwo@f,1400000.
> Invalid SCSI target number fffe55d0
> error: unable to open /sbus@1f,0/SUNW,fas@e,8800000/sd.
> 
> This is really the same I even got with 6.1 kernels. I was not able to 
> boot this system into debian at all.

We can look into this later, this is not a kernel but a bootloader problem.

Adrian

-- 
 .''`.  John Paul Adrian Glaubitz
: :' :  Debian Developer
`. `'   Physicist
  `-    GPG: 62FF 8A75 84E0 2956 9546  0006 7426 3B37 F5B5 F913


Reply to: