Re: New Debian sparc64 test kernel for stack corruption issue
Hi Adrian!"
John Paul Adrian Glaubitz wrote:
I have created UMP and SMP test kernel Debian packages to verify this:
https://people.debian.org/~glaubitz/sparc64/
Could someone test this kernel? It works for me in a SPARC T4 LDOM.
FWIW, the kernel does*not* yet include the fixes for accurate exception reporting [1],
so expect that the kernels may not be stable on older UltraSPARC systems. The fixes for
that will be included in 6.17.3 or newer or 6.18 or newer.
I found two versions. I tested:
https://people.debian.org/~glaubitz/sparc64/linux-image-6.16.12+3-sparc64-smp_6.16.12-2+sparc64.1_sparc64.deb
On T2000 with Niagara
[ 12.126130] mptsas 0000:07:00.0: Unable to change power state from
D3cold to
D0, device inaccessible
[ 12.463473] NON-RESUMABLE ERROR: Reporting on cpu 31
[ 12.463643] NON-RESUMABLE ERROR: TPC [0x0000000010184034]
<MakeIocReady+0x10/
0x298 [mptbase]>
[ 12.463810] NON-RESUMABLE ERROR: RAW
[1f10000000000007:0000000e3179235c:00000
00202000004:000000ea00300000
[ 12.463894] NON-RESUMABLE ERROR: 00000000001f0000:0000000000000000:00000
00000000000:0000000000000000]
[ 12.463975] NON-RESUMABLE ERROR: handle [0x1f10000000000007] stick
[0x0000000
e3179235c]
[ 12.464050] NON-RESUMABLE ERROR: type [precise nonresumable]
[ 12.464113] NON-RESUMABLE ERROR: attrs [0x02000004] < PIO sp-faulted
priv >
[ 12.464221] NON-RESUMABLE ERROR: raddr [0x000000ea00300000]
[ 12.465352] Kernel panic - not syncing: Non-resumable error.
[ 12.465422] CPU: 31 UID: 0 PID: 367 Comm: (udev-worker) Not tainted
6.16.12+3
-sparc64-smp #1 NONE Debian 6.16.12-2+sparc64.1
[ 12.465532] Call Trace:
[ 12.465574] [<00000000004373c4>] dump_stack+0x8/0x18
[ 12.465656] [<0000000000429540>] panic+0xf4/0x398
[ 12.465727] [<000000000043afcc>] sun4v_nonresum_error+0x16c/0x240
[ 12.465817] [<0000000000406eb8>] sun4v_nonres_mondo+0xc8/0xd8
[ 12.465910] [<0000000010184034>] MakeIocReady+0x10/0x298 [mptbase]
[ 12.466007] [<00000000101844b4>] mpt_do_ioc_recovery+0x9c/0x1110
[mptbase]
[ 12.466103] [<00000000101836f8>] mpt_attach+0xb58/0xd20 [mptbase]
[ 12.466209] [<0000000010287f30>] mptsas_probe+0x10/0x440 [mptsas]
[ 12.466336] [<0000000000b3fab0>] local_pci_probe+0x30/0x80
[ 12.466427] [<0000000000b405d4>] pci_device_probe+0xb4/0x240
[ 12.466518] [<0000000000bfd348>] really_probe+0xc8/0x400
[ 12.466612] [<0000000000bfd70c>] __driver_probe_device+0x8c/0x160
[ 12.466709] [<0000000000bfd8c8>] driver_probe_device+0x28/0x100
[ 12.466805] [<0000000000bfdb7c>] __driver_attach+0xbc/0x1e0
[ 12.466900] [<0000000000bfacfc>] bus_for_each_dev+0x5c/0xc0
[ 12.466992] [<0000000000bfcafc>] driver_attach+0x1c/0x40
[ 13.088506] Press Stop-A (L1-A) from sun keyboard or send break
[ 13.088506] twice on console to return to the boot prom
[ 13.088811] ---[ end Kernel panic - not syncing: Non-resumable error.
]---
still crash, but there is some "information"
Take in consideration that on this sytem my sweet spot for kernel is:
6.12.38+deb13-sparc64-smp
older stuff was crashy, IIRC all newer kernel (or most?) straigt from
debian and most kernel you provided fail to boot.
Instead on single-CPU I tested:
https://people.debian.org/~glaubitz/sparc64/linux-image-6.16.12+3-sparc64_6.16.12-2+sparc64.1_sparc64.deb
On Netra T1 UltraSPARC IIe
- boots fine
- basic operations like apt-get work fine
- some compilation - survives fine
In dmesg I see no failures, errors... except this:
[ 0.952369] pci_bus 0000:01: extended config space not accessible
[ 0.957507] pci_bus 0000:02: extended config space not accessible
[ 10.402607] This architecture does not have kernel memory protection.
[ 21.640233] Warning! ehci_hcd should always be loaded before uhci_hcd
and ohci_hcd, not after
[ 33.087538] PM: Image not found (code -22)
[ 35.720238] Not activating Mandatory Access Control as
/sbin/tomoyo-init does not exist.
Then I tried the said kernel on Ultra1 with UltraSPARC I (literally
transferring the same Hard Disk)
As "expected" there were no improvements over other kernels. same boot
failure:
Invalid sbus slot number 31
Invalid sbus slot number 31
error: canonicalise devname failed.
Can't read disk label.
Can't open disk label package
error: unable to open /sbus@1f,0/SUNW,fdtwo@f,1400000.
Invalid SCSI target number fffe55d0
error: unable to open /sbus@1f,0/SUNW,fas@e,8800000/sd.
This is really the same I even got with 6.1 kernels. I was not able to
boot this system into debian at all.
Riccardo
Reply to: