[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#966703: linux-image-4.19.0-10-amd64: kworker process with permanent high CPU load



Hi Salvatore,

I have removed the xorg.conf with the Nvidia graphics driver and any nvidia-related *.conf files in /etc/modprobe.d/, and I have rebooted the laptop. The following output should show, that only the default nouveau driver is loaded:

# lsmod | grep nvidia

# lsmod | grep nouveau
nouveau              2179072  0
ttm                   131072  1 nouveau
i2c_algo_bit           16384  2 i915,nouveau
drm_kms_helper        208896  2 i915,nouveau
mxm_wmi                16384  1 nouveau
drm                   495616  12 drm_kms_helper,i915,ttm,nouveau
wmi                    28672  6 dell_wmi,wmi_bmof,dell_smbios,dell_wmi_descriptor,mxm_wmi,nouveau
video                  45056  4 dell_wmi,dell_laptop,i915,nouveau
button                 16384  1 nouveau

# lspci -k | egrep 'VGA|3D' -A2
00:02.0 VGA compatible controller: Intel Corporation HD Graphics 530 (rev 06)
        Subsystem: Dell HD Graphics 530
        Kernel driver in use: i915
--
01:00.0 3D controller: NVIDIA Corporation GM107GLM [Quadro M1000M] (rev a2)
        Subsystem: Dell GM107GLM [Quadro M1000M]
        Kernel driver in use: nouveau

# dmesg | grep -i nvidia
[    4.282530] nouveau 0000:01:00.0: NVIDIA GM107 (117310a2)
[    4.547712] audit: type=1400 audit(1596389563.639:8): apparmor="STATUS" operation="profile_load" profile="unconfined" name="nvidia_modprobe" pid=543 comm="apparmor_parser" [    4.547714] audit: type=1400 audit(1596389563.639:9): apparmor="STATUS" operation="profile_load" profile="unconfined" name="nvidia_modprobe//kmod" pid=543 comm="apparmor_parser"
[    5.944911] nvidia: loading out-of-tree module taints kernel.
[    5.944918] nvidia: module license 'NVIDIA' taints kernel.
[    5.949482] nvidia: module verification failed: signature and/or required key missing - tainting kernel [    5.962949] nvidia-nvlink: Nvlink Core is being initialized, major device number 241 [    5.963181] NVRM: The NVIDIA probe routine was not called for 1 device(s).
               NVRM: nouveau, rivafb, nvidiafb or rivatv
               NVRM: was loaded and obtained ownership of the NVIDIA device(s).
               NVRM: driver(s)), then try loading the NVIDIA kernel module
[    5.963182] NVRM: No NVIDIA graphics adapter probed!
[    6.005267] nvidia-nvlink: Unregistered the Nvlink Core, major device number 241 [    6.075128] nvidia-nvlink: Nvlink Core is being initialized, major device number 241 [    6.075448] NVRM: The NVIDIA probe routine was not called for 1 device(s).
               NVRM: nouveau, rivafb, nvidiafb or rivatv
               NVRM: was loaded and obtained ownership of the NVIDIA device(s).
               NVRM: driver(s)), then try loading the NVIDIA kernel module
[    6.075449] NVRM: No NVIDIA graphics adapter probed!
[    6.097310] nvidia-nvlink: Unregistered the Nvlink Core, major device number 241

Apparently, the nvidia driver was loaded first, and after that, the nouveau driver took over.

Here is the "top" result, again with a permanent high CPU load for a kworker process:

# top
top - 19:50:57 up 18 min,  4 users,  load average: 1,26, 1,22, 0,93
Tasks: 198 total,   2 running, 196 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0,0 us, 11,3 sy,  0,0 ni, 87,1 id,  0,0 wa,  0,0 hi, 1,6 si,  0,0 st
MiB Mem :  15889,5 total,  13903,9 free,    808,5 used,   1177,0 buff/cache
MiB Swap:      0,0 total,      0,0 free,      0,0 used.  14617,1 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM TIME+ COMMAND
   72 root      20   0       0      0      0 R  86,7   0,0 15:23.97 kworker/7:1+pm    47 root      20   0       0      0      0 S  13,3   0,0 2:52.21 ksoftirqd/7
  684 root      20   0  505356 126896 102732 S   6,7   0,8 0:20.77 Xorg
    1 root      20   0  169624  10312   7880 S   0,0   0,1 0:01.34 systemd
    2 root      20   0       0      0      0 S   0,0   0,0 0:00.00 kthreadd

Here is the stack of PID 72:

# cat /proc/72/stack
[<0>] 0xffffffffffffffff

The file with a few seconds tracing, cut after line 5000 and compressed, is attached as "out-no-nvidia.txt.gz".

Please, let me know, whether my way of not loading the nvidia driver was sufficient or not. If it is required to completely uninstall the Nvidia driver for a really untainted system, I will do it, but would need more time for this.

Regards,

Dirk.

Am 02.08.20 um 18:22 schrieb Salvatore Bonaccorso:

Hi Dirk,

On Sun, Aug 02, 2020 at 03:44:09PM +0200, Salvatore Bonaccorso wrote:
Control: tags -1 + moreinfo

Hi Dirk

On Sun, Aug 02, 2020 at 10:00:27AM +0200, Dirk Kostrewa wrote:
Package: src:linux
Version: 4.19.132-1
Severity: normal

Dear Maintainer,

after booting the kernel 4.19.0-10-amd64, there is a kworker process running
with a permanent high CPU load of almost 90% as reported by the "top"
command:

$ top
top - 09:48:19 up 0 min,  4 users,  load average: 1.91, 0.58, 0.20
Tasks: 218 total,   2 running, 216 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.8 us, 12.4 sy,  0.0 ni, 84.5 id,  0.0 wa,  0.0 hi,  2.3 si,  0.0
st
MiB Mem :  15889.4 total,  14173.1 free,    889.3 used,    827.0 buff/cache
MiB Swap:      0.0 total,      0.0 free,      0.0 used.  14677.7 avail Mem

   PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM TIME+ COMMAND
    64 root      20   0       0      0      0 R  86.7   0.0 0:47.41
kworker/0:2+pm
     9 root      20   0       0      0      0 S  20.0   0.0 0:08.84
ksoftirqd/0
   364 root     -51   0       0      0      0 S   6.7   0.0 0:00.50
irq/126-nvidia
  1177 dirk      20   0 2921696 122848  94268 S   6.7   0.8 0:02.23 kwin_x11
     1 root      20   0  169652  10280   7740 S   0.0   0.1 0:01.56 systemd
     2 root      20   0       0      0      0 S   0.0   0.0 0:00.00 kthreadd
...

The expected result after booting the kernel 4.19.0-10-amd64 is a kworker
process with a CPU load close to 0%.

As a control, booting the previous kernel 4.19.0-9-amd64 does not show a
high CPU load for the kworker process. Instead, the kworker CPU load
reported by the "top" command is 0.0%.

Therefore, I suspect a bug in the kernel 4.19.0-10-amd64.

Neither "dmesg" nor "journalctl -b" show any messages containing "kworker".

I am using Debian/GNU Linux 10.5 with kernel 4.19.0-10-amd64 and libc6:amd64
2.28-10.

If you need more information, I would be happy to provide it.
To find out what could be the cause, could you have a look at
https://www.kernel.org/doc/html/latest/core-api/workqueue.html#debugging
this could help determining isolating why the kworker goes crazy.
Please as well to the above one additional thing: Can you reproduce
the issue when the kernel does not get tained? So without loading the
propriertary, out-of-tree modules.

This is particularly important if the issue can be tracked down, found
in upstream and needs to be reported upstream.

Regards,
Salvatore

Attachment: out-no-nvidia.txt.gz
Description: application/gzip


Reply to: