[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Trouble with nvidia drivers in Debian 12 Bookworm



Hi,
I'm trying to get a Tesla T4 working under Debian 12.

So far I've tried two approaches:
1. Using the Debian provided drivers, per
https://wiki.debian.org/NvidiaGraphicsDrivers
2. Using the nVidia provided drivers installed via runfile, per
https://docs.nvidia.com/datacenter/tesla/tesla-installation-notes/index.html

For 1 (installing the drivers in the debian nonfree repository),
everything seems to install fine but the drivers don't load properly.
Systemctl returns the following:

$ systemctl status systemd-modules-load
× systemd-modules-load.service - Load Kernel Modules
     Loaded: loaded (/lib/systemd/system/systemd-modules-load.service; static)
     Active: failed (Result: exit-code) since Thu 2023-07-13 21:05:08
UTC; 18min ago
       Docs: man:systemd-modules-load.service(8)
             man:modules-load.d(5)
    Process: 220 ExecStart=/lib/systemd/systemd-modules-load
(code=exited, status=1/FAILURE)
   Main PID: 220 (code=exited, status=1/FAILURE)
        CPU: 29ms

Jul 13 21:05:08 localhost systemd-modules-load[226]: modprobe: ERROR:
could not insert 'nvidia': Invalid argument
Jul 13 21:05:08 localhost systemd-modules-load[230]: modprobe: FATAL:
Module nvidia-current-modeset not found in directory
/lib/modules/6.1.0-10-cloud-amd64
Jul 13 21:05:08 localhost systemd-modules-load[223]: modprobe: ERROR:
../libkmod/libkmod-module.c:1047 command_do() Error running install
command 'modprobe nvidia ; modprobe -i nvidia-current-modeset ' for m>
Jul 13 21:05:08 localhost systemd-modules-load[223]: modprobe: ERROR:
could not insert 'nvidia_modeset': Invalid argument
Jul 13 21:05:08 localhost systemd-modules-load[232]: modprobe: FATAL:
Module nvidia-current-drm not found in directory
/lib/modules/6.1.0-10-cloud-amd64
Jul 13 21:05:08 localhost systemd-modules-load[220]: Error running
install command 'modprobe nvidia-modeset ; modprobe -i
nvidia-current-drm ' for module nvidia_drm: retcode 1
Jul 13 21:05:08 localhost systemd-modules-load[220]: Failed to insert
module 'nvidia_drm': Invalid argument
Jul 13 21:05:08 localhost systemd[1]: systemd-modules-load.service:
Main process exited, code=exited, status=1/FAILURE
Jul 13 21:05:08 localhost systemd[1]: systemd-modules-load.service:
Failed with result 'exit-code'.
Jul 13 21:05:08 localhost systemd[1]: Failed to start
systemd-modules-load.service - Load Kernel Modules.

When I try to use the runfile (specifically, this file:
https://us.download.nvidia.com/tesla/535.54.03/NVIDIA-Linux-x86_64-535.54.03.run)
it is unable to read the kernel headers that I have installed (if I
don't specify a location, it says it can't find them, no matter which
location I specify, it finds something unexpected about what's there).

Any help is appreciated!

PS: Secureboot is disabled, I get the following from mokutil:
$ mokutil --sb-state
SecureBoot disabled


Reply to: