[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [systemd:1]



ADDED:  I wonder if it’s systemd specifically that causes this for you based on the console output.

I have this in dmesg which matches the start of your output.

[Mar13 09:26] systemd[1]: systemd 247.3-3 running in system mode. (+PAM +AUDIT +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +ZSTD -SECCOMP +BLKID +ELFUTILS +KMOD +IDN2 -IDN +PCRE2 default-hierarchy=unified)
[  +0.000928] systemd[1]: Detected architecture sparc64.
[  +2.019373] systemd[1]: /lib/systemd/system/plymouth-start.service:16: Unit configured to use KillMode=none. This is unsafe, as it disables systemd's process lifecycle management for the service. Please update your service to use a safer KillMode=, such as 'mixed' or 'control-group'. Support for KillMode=none is deprecated and will eventually be removed.
[  +5.426590] systemd-journald[181]: Received client request to relinquish /var/log/journal/5cbf90a5ea124196a208a4297d97ce25 access.
[  +0.099645] systemd-journald[181]: Received SIGTERM from PID 1 (systemd).
[  +0.016174] systemd[1]: Stopping Journal Service...
[  +0.017883] systemd[1]: systemd-journald.service: Succeeded.
[  +0.010165] systemd[1]: Stopped Journal Service.
[  +0.000494] systemd[1]: systemd-journald.service: Consumed 2.275s CPU time.
[  +0.088359] systemd[1]: Starting Journal Service...
[  +0.181377] systemd[1]: Started Journal Service.
[  +0.359133] systemd-journald[2434]: Received client request to flush runtime journal.


In /var/log/apt/term.log we see it hits the systemd about the right time...

Log started: 2021-03-13  09:25:35
(Reading database ... 150966 files and directories currently installed.)
Preparing to unpack .../dash_0.5.11+git20210120+802ebd4-1_sparc64.deb ...
Unpacking dash (0.5.11+git20210120+802ebd4-1) over (0.5.11+git20200708+dd9ef66-5) ...
Setting up dash (0.5.11+git20210120+802ebd4-1) ...
(Reading database ... 150966 files and directories currently installed.)
Preparing to unpack .../gzip_1.10-4_sparc64.deb ...
Unpacking gzip (1.10-4) over (1.10-3) ...
Setting up gzip (1.10-4) ...
(Reading database ... 150966 files and directories currently installed.)
Preparing to unpack .../liblzma5_5.2.5-2_sparc64.deb ...
Unpacking liblzma5:sparc64 (5.2.5-2) over (5.2.5-1.0) ...
Setting up liblzma5:sparc64 (5.2.5-2) ...
(Reading database ... 150966 files and directories currently installed.)
Preparing to unpack .../libnss-systemd_247.3-3_sparc64.deb ...
Unpacking libnss-systemd:sparc64 (247.3-3) over (247.3-1) ...
Preparing to unpack .../libsystemd0_247.3-3_sparc64.deb ...
Unpacking libsystemd0:sparc64 (247.3-3) over (247.3-1) ...
Setting up libsystemd0:sparc64 (247.3-3) ...
(Reading database ... 150966 files and directories currently installed.)
Preparing to unpack .../systemd-timesyncd_247.3-3_sparc64.deb ...
Unpacking systemd-timesyncd (247.3-3) over (247.3-1) ...
Preparing to unpack .../libpam-systemd_247.3-3_sparc64.deb ...
Unpacking libpam-systemd:sparc64 (247.3-3) over (247.3-1) ...
Preparing to unpack .../systemd_247.3-3_sparc64.deb ...
Unpacking systemd (247.3-3) over (247.3-1) ...
Preparing to unpack .../udev_247.3-3_sparc64.deb ...
Unpacking udev (247.3-3) over (247.3-1) ...
Preparing to unpack .../libudev1_247.3-3_sparc64.deb ...
Unpacking libudev1:sparc64 (247.3-3) over (247.3-1) ...
Setting up libudev1:sparc64 (247.3-3) ...
Setting up systemd-timesyncd (247.3-3) ...
Setting up systemd (247.3-3) ...
(Reading database ... 150966 files and directories currently installed.)
Preparing to unpack .../systemd-sysv_247.3-3_sparc64.deb ...
Unpacking systemd-sysv (247.3-3) over (247.3-1) ...

-Mike


On Mar 13, 2021, at 9:29 AM, Mike Tremaine <mgt@stellarcore.net> wrote:



On Mar 12, 2021, at 5:56 AM, Dennis Clarke <dclarke@blastwave.org> wrote:


I have seen this for a few months now. The old old netra machine will
run just fine endlessly but if I attempt to perform a package update
then I am always assured to see :



What kernel are you on? I do not have a Netra handy (but I have one in storage,  like everyone ;p ).  I have an Ultra 5 here so UltraSparc IIi CPU. It does not expect this behavior.  Any chance the memory module need to be reseated?

ceres# apt-get update
Get:1 http://deb.debian.org/debian-ports sid InRelease [55.3 kB]
Get:2 http://deb.debian.org/debian-ports sid/main sparc64 Packages [21.6 MB]
Get:3 http://deb.debian.org/debian-ports sid/main all Packages [8,682
kB]
Fetched 30.3 MB in 1min 24s (361 kB/s)

Reading package lists... Done
ceres#

Then try "upgrade" and the machine drops off the network :


I have unstable the mix but as point of reference….

mgt@xray:~$ uname -a
Linux xray 5.10.0-3-sparc64 #1 Debian 5.10.13-1 (2021-02-06) sparc64 GNU/Linux
mgt@xray:~$ cat /etc/debian_version 
bullseye/sid
mgt@xray:~$ cat /proc/cpuinfo 
cpu : TI UltraSparc IIi (Sabre)
fpu : UltraSparc IIi integrated FPU
pmu : ultra12
prom : OBP 3.31.0 2001/07/25 20:36
type : sun4u
ncpus probed : 1
ncpus active : 1
D$ parity tl1 : 0
I$ parity tl1 : 0
Cpu0ClkTck : 0000000013d92d40
cpucaps : flush,stbar,swap,muldiv,v9,mul32,div32,v8plus,vis
MMU Type : Spitfire
MMU PGSZs : 8K,64K,512K,4MB

root@xray:/home/users/mgt# apt update
Get:1 http://deb.debian.org/debian-ports sid InRelease [55.3 kB]
Get:2 http://deb.debian.org/debian-ports unreleased InRelease [56.6 kB]
Get:3 http://deb.debian.org/debian-ports sid/main all Packages [9,069 kB]                                                                              
Get:4 http://deb.debian.org/debian-ports sid/main sparc64 Packages [21.5 MB]                                                                           
Fetched 30.7 MB in 1min 55s (266 kB/s)                                                                                                                 
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
111 packages can be upgraded. Run 'apt list --upgradable' to see them.
root@xray:/home/users/mgt# apt list --upgradeable
Listing… Done
.
.

apt upgrade was then run and 111 packages upgraded without issue….

Setting up systemd (247.3-1) ...
Timeout, server 172.16.35.61 not responding.

On the serial console we see :

ceres# [2968669.114937] systemd[1]: systemd 247.3-1 running in system
mode. (+PAM +AUDIT +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT +UTMP
+LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +ZSTD -SECCOMP +BLKID
+ELFUTILS +KMOD +IDN2 -IDN +PCRE2 default-hierarchy=unified)
[2968669.411163] systemd[1]: Detected architecture sparc64.
[2968696.703129] watchdog: BUG: soft lockup - CPU#0 stuck for 23s!
[systemd:1]
[2968696.794780] Modules linked in: drm(E)
drm_panel_orientation_quirks(E) i2c_core(E) sg(E) envctrl(E)
display7seg(E) flash(E) fuse(E) configfs(E) ip_tables(E) x_tables(E)
autofs4(E) ext4(E) crc16(E) mbcache(E) jbd2(E) crc32c_generic(E)
sd_mod(E) t10_pi(E) crc_t10dif(E) crct10dif_generic(E)
crct10dif_common(E) ata_generic(E) pata_cmd64x(E) libata(E) sym53c8xx(E)
scsi_transport_spi(E) scsi_mod(E) sunhme(E)
[2968697.265208] CPU: 0 PID: 1 Comm: systemd Tainted: G            E
5.10.0-1-sparc64 #1 Debian 5.10.5-1
[2968697.391074] TSTATE: 0000000011001604 TPC: 000000000094c4f0 TNPC:
000000000094c4f4 Y: 00000000    Tainted: G            E
[2968697.541033] TPC: <misc_open+0x50/0x180>
[2968697.593712] g0: fffff800065a1c80 g1: 0000000000000098 g2:
0000000000000000 g3: 0000000000000002
[2968697.710488] g4: fffff80004197020 g5: 0000000000e93214 g6:
fffff80004198000 g7: 0000000000500008
[2968697.827256] o0: 0000000000f24960 o1: fffff800049ab110 o2:
0000000000040000 o3: 0000000000000000
[2968697.944022] o4: 0000000000000000 o5: 0000000000000000 sp:
fffff8000419af81 ret_pc: 000000000094c4c0
[2968698.065369] RPC: <misc_open+0x20/0x180>
[2968698.118074] l0: 0000000000f24800 l1: fffff800041ce021 l2:
00000003e775fef2 l3: 00000003e775fef2
[2968698.234848] l4: 0000000000020000 l5: fffff8000419b8f0 l6:
0000000000e12000 l7: 0000000000000001
[2968698.351615] i0: fffff8000b791048 i1: fffff800049ab100 i2:
0000000000f24800 i3: 0000000000f24978
[2968698.468381] i4: 00000000000000eb i5: 0000000010040818 i6:
fffff8000419b031 i7: 0000000000665838
[2968698.585168] I7: <chrdev_open+0x98/0x1e0>
[2968698.638996] Call Trace:
[2968698.673323] [<0000000000665838>] chrdev_open+0x98/0x1e0
[2968698.744355] [<000000000065ae30>] do_dentry_open+0x170/0x420
[2968698.819928] [<000000000065ca68>] vfs_open+0x28/0x40
[2968698.886379] [<0000000000671348>] path_openat+0x988/0x1100
[2968698.959682] [<0000000000673dd0>] do_filp_open+0x50/0x100
[2968699.031837] [<000000000065cd30>] do_sys_openat2+0x70/0x180
[2968699.106284] [<000000000065d268>] sys_openat+0x48/0xc0
[2968699.175027] [<0000000000406174>] linux_sparc_syscall+0x34/0x44
~
Type  'go' to resume
ok ~
[EOT]

This is pretty consistent behavior. If someone has any ideas that would
be great. I realize that the old old Netra X1 or Netra T1 is well past
its prime but it does run very stable.  I would love to fire up a big
Oracle M4000 unit to try but I have not heard from anyone anywhere that
knows if that can work at all. So for now these old netra units are all
that I can test with.


-- 
Dennis Clarke
RISC-V/SPARC/PPC/ARM/CISC
UNIX and Linux spoken
GreyBeard and suspenders optional

The Netra’s have few different devices wonder if there is a bug in one of those drivers?

-Mike


Reply to: