[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: New Debian sparc64 test kernel for stack corruption issue



Hi Adrian!"

John Paul Adrian Glaubitz wrote:
I have created UMP and SMP test kernel Debian packages to verify this:

https://people.debian.org/~glaubitz/sparc64/

Could someone test this kernel? It works for me in a SPARC T4 LDOM.

FWIW, the kernel does*not*  yet include the fixes for accurate exception reporting [1],
so expect that the kernels may not be stable on older UltraSPARC systems. The fixes for
that will be included in 6.17.3 or newer or 6.18 or newer.
I found two versions. I tested:

https://people.debian.org/~glaubitz/sparc64/linux-image-6.16.12+3-sparc64-smp_6.16.12-2+sparc64.1_sparc64.deb

On T2000 with Niagara

[   12.126130] mptsas 0000:07:00.0: Unable to change power state from D3cold to
D0, device inaccessible
[   12.463473] NON-RESUMABLE ERROR: Reporting on cpu 31
[   12.463643] NON-RESUMABLE ERROR: TPC [0x0000000010184034] <MakeIocReady+0x10/
0x298 [mptbase]>
[   12.463810] NON-RESUMABLE ERROR: RAW [1f10000000000007:0000000e3179235c:00000
00202000004:000000ea00300000
[   12.463894] NON-RESUMABLE ERROR: 00000000001f0000:0000000000000000:00000
00000000000:0000000000000000]
[   12.463975] NON-RESUMABLE ERROR: handle [0x1f10000000000007] stick [0x0000000
e3179235c]
[   12.464050] NON-RESUMABLE ERROR: type [precise nonresumable]
[   12.464113] NON-RESUMABLE ERROR: attrs [0x02000004] < PIO sp-faulted priv >
[   12.464221] NON-RESUMABLE ERROR: raddr [0x000000ea00300000]
[   12.465352] Kernel panic - not syncing: Non-resumable error.
[   12.465422] CPU: 31 UID: 0 PID: 367 Comm: (udev-worker) Not tainted 6.16.12+3
-sparc64-smp #1 NONE  Debian 6.16.12-2+sparc64.1
[   12.465532] Call Trace:
[   12.465574] [<00000000004373c4>] dump_stack+0x8/0x18
[   12.465656] [<0000000000429540>] panic+0xf4/0x398
[   12.465727] [<000000000043afcc>] sun4v_nonresum_error+0x16c/0x240
[   12.465817] [<0000000000406eb8>] sun4v_nonres_mondo+0xc8/0xd8
[   12.465910] [<0000000010184034>] MakeIocReady+0x10/0x298 [mptbase]
[   12.466007] [<00000000101844b4>] mpt_do_ioc_recovery+0x9c/0x1110 [mptbase]
[   12.466103] [<00000000101836f8>] mpt_attach+0xb58/0xd20 [mptbase]
[   12.466209] [<0000000010287f30>] mptsas_probe+0x10/0x440 [mptsas]
[   12.466336] [<0000000000b3fab0>] local_pci_probe+0x30/0x80
[   12.466427] [<0000000000b405d4>] pci_device_probe+0xb4/0x240
[   12.466518] [<0000000000bfd348>] really_probe+0xc8/0x400
[   12.466612] [<0000000000bfd70c>] __driver_probe_device+0x8c/0x160
[   12.466709] [<0000000000bfd8c8>] driver_probe_device+0x28/0x100
[   12.466805] [<0000000000bfdb7c>] __driver_attach+0xbc/0x1e0
[   12.466900] [<0000000000bfacfc>] bus_for_each_dev+0x5c/0xc0
[   12.466992] [<0000000000bfcafc>] driver_attach+0x1c/0x40
[   13.088506] Press Stop-A (L1-A) from sun keyboard or send break
[   13.088506] twice on console to return to the boot prom
[   13.088811] ---[ end Kernel panic - not syncing: Non-resumable error. ]---

still crash, but there is some "information"


Take in consideration that on this sytem my sweet spot for kernel is:
6.12.38+deb13-sparc64-smp

older stuff was crashy, IIRC all newer kernel (or most?) straigt from debian and most kernel you provided fail to boot.

Instead on single-CPU I tested:

https://people.debian.org/~glaubitz/sparc64/linux-image-6.16.12+3-sparc64_6.16.12-2+sparc64.1_sparc64.deb

On Netra T1 UltraSPARC IIe

- boots fine
- basic operations like apt-get work fine
- some compilation - survives fine

In dmesg I see no failures, errors... except this:

[    0.952369] pci_bus 0000:01: extended config space not accessible
[    0.957507] pci_bus 0000:02: extended config space not accessible
[   10.402607] This architecture does not have kernel memory protection.
[   21.640233] Warning! ehci_hcd should always be loaded before uhci_hcd and ohci_hcd, not after
[   33.087538] PM: Image not found (code -22)
[   35.720238] Not activating Mandatory Access Control as /sbin/tomoyo-init does not exist.


Then I tried the said kernel on Ultra1 with UltraSPARC I (literally transferring the same Hard Disk) As "expected" there were no improvements over other kernels. same boot failure:
Invalid sbus slot number 31
Invalid sbus slot number 31
error: canonicalise devname failed.
Can't read disk label.
Can't open disk label package
error: unable to open /sbus@1f,0/SUNW,fdtwo@f,1400000.
Invalid SCSI target number fffe55d0
error: unable to open /sbus@1f,0/SUNW,fas@e,8800000/sd.

This is really the same I even got with 6.1 kernels. I was not able to boot this system into debian at all.


Riccardo



Reply to: