[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [systemd:1]





On Mar 12, 2021, at 5:56 AM, Dennis Clarke <dclarke@blastwave.org> wrote:


I have seen this for a few months now. The old old netra machine will
run just fine endlessly but if I attempt to perform a package update
then I am always assured to see :



What kernel are you on? I do not have a Netra handy (but I have one in storage,  like everyone ;p ).  I have an Ultra 5 here so UltraSparc IIi CPU. It does not expect this behavior.  Any chance the memory module need to be reseated?

ceres# apt-get update
Get:1 http://deb.debian.org/debian-ports sid InRelease [55.3 kB]
Get:2 http://deb.debian.org/debian-ports sid/main sparc64 Packages [21.6 MB]
Get:3 http://deb.debian.org/debian-ports sid/main all Packages [8,682
kB]
Fetched 30.3 MB in 1min 24s (361 kB/s)

Reading package lists... Done
ceres#

Then try "upgrade" and the machine drops off the network :


I have unstable the mix but as point of reference….

mgt@xray:~$ uname -a
Linux xray 5.10.0-3-sparc64 #1 Debian 5.10.13-1 (2021-02-06) sparc64 GNU/Linux
mgt@xray:~$ cat /etc/debian_version 
bullseye/sid
mgt@xray:~$ cat /proc/cpuinfo 
cpu : TI UltraSparc IIi (Sabre)
fpu : UltraSparc IIi integrated FPU
pmu : ultra12
prom : OBP 3.31.0 2001/07/25 20:36
type : sun4u
ncpus probed : 1
ncpus active : 1
D$ parity tl1 : 0
I$ parity tl1 : 0
Cpu0ClkTck : 0000000013d92d40
cpucaps : flush,stbar,swap,muldiv,v9,mul32,div32,v8plus,vis
MMU Type : Spitfire
MMU PGSZs : 8K,64K,512K,4MB

root@xray:/home/users/mgt# apt update
Get:1 http://deb.debian.org/debian-ports sid InRelease [55.3 kB]
Get:2 http://deb.debian.org/debian-ports unreleased InRelease [56.6 kB]
Get:3 http://deb.debian.org/debian-ports sid/main all Packages [9,069 kB]                                                                              
Get:4 http://deb.debian.org/debian-ports sid/main sparc64 Packages [21.5 MB]                                                                           
Fetched 30.7 MB in 1min 55s (266 kB/s)                                                                                                                 
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
111 packages can be upgraded. Run 'apt list --upgradable' to see them.
root@xray:/home/users/mgt# apt list --upgradeable
Listing… Done
.
.

apt upgrade was then run and 111 packages upgraded without issue….

Setting up systemd (247.3-1) ...
Timeout, server 172.16.35.61 not responding.

On the serial console we see :

ceres# [2968669.114937] systemd[1]: systemd 247.3-1 running in system
mode. (+PAM +AUDIT +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT +UTMP
+LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +ZSTD -SECCOMP +BLKID
+ELFUTILS +KMOD +IDN2 -IDN +PCRE2 default-hierarchy=unified)
[2968669.411163] systemd[1]: Detected architecture sparc64.
[2968696.703129] watchdog: BUG: soft lockup - CPU#0 stuck for 23s!
[systemd:1]
[2968696.794780] Modules linked in: drm(E)
drm_panel_orientation_quirks(E) i2c_core(E) sg(E) envctrl(E)
display7seg(E) flash(E) fuse(E) configfs(E) ip_tables(E) x_tables(E)
autofs4(E) ext4(E) crc16(E) mbcache(E) jbd2(E) crc32c_generic(E)
sd_mod(E) t10_pi(E) crc_t10dif(E) crct10dif_generic(E)
crct10dif_common(E) ata_generic(E) pata_cmd64x(E) libata(E) sym53c8xx(E)
scsi_transport_spi(E) scsi_mod(E) sunhme(E)
[2968697.265208] CPU: 0 PID: 1 Comm: systemd Tainted: G            E
5.10.0-1-sparc64 #1 Debian 5.10.5-1
[2968697.391074] TSTATE: 0000000011001604 TPC: 000000000094c4f0 TNPC:
000000000094c4f4 Y: 00000000    Tainted: G            E
[2968697.541033] TPC: <misc_open+0x50/0x180>
[2968697.593712] g0: fffff800065a1c80 g1: 0000000000000098 g2:
0000000000000000 g3: 0000000000000002
[2968697.710488] g4: fffff80004197020 g5: 0000000000e93214 g6:
fffff80004198000 g7: 0000000000500008
[2968697.827256] o0: 0000000000f24960 o1: fffff800049ab110 o2:
0000000000040000 o3: 0000000000000000
[2968697.944022] o4: 0000000000000000 o5: 0000000000000000 sp:
fffff8000419af81 ret_pc: 000000000094c4c0
[2968698.065369] RPC: <misc_open+0x20/0x180>
[2968698.118074] l0: 0000000000f24800 l1: fffff800041ce021 l2:
00000003e775fef2 l3: 00000003e775fef2
[2968698.234848] l4: 0000000000020000 l5: fffff8000419b8f0 l6:
0000000000e12000 l7: 0000000000000001
[2968698.351615] i0: fffff8000b791048 i1: fffff800049ab100 i2:
0000000000f24800 i3: 0000000000f24978
[2968698.468381] i4: 00000000000000eb i5: 0000000010040818 i6:
fffff8000419b031 i7: 0000000000665838
[2968698.585168] I7: <chrdev_open+0x98/0x1e0>
[2968698.638996] Call Trace:
[2968698.673323] [<0000000000665838>] chrdev_open+0x98/0x1e0
[2968698.744355] [<000000000065ae30>] do_dentry_open+0x170/0x420
[2968698.819928] [<000000000065ca68>] vfs_open+0x28/0x40
[2968698.886379] [<0000000000671348>] path_openat+0x988/0x1100
[2968698.959682] [<0000000000673dd0>] do_filp_open+0x50/0x100
[2968699.031837] [<000000000065cd30>] do_sys_openat2+0x70/0x180
[2968699.106284] [<000000000065d268>] sys_openat+0x48/0xc0
[2968699.175027] [<0000000000406174>] linux_sparc_syscall+0x34/0x44
~
Type  'go' to resume
ok ~
[EOT]

This is pretty consistent behavior. If someone has any ideas that would
be great. I realize that the old old Netra X1 or Netra T1 is well past
its prime but it does run very stable.  I would love to fire up a big
Oracle M4000 unit to try but I have not heard from anyone anywhere that
knows if that can work at all. So for now these old netra units are all
that I can test with.


--
Dennis Clarke
RISC-V/SPARC/PPC/ARM/CISC
UNIX and Linux spoken
GreyBeard and suspenders optional

The Netra’s have few different devices wonder if there is a bug in one of those drivers?

-Mike


Reply to: