Re: Trouble with nvidia drivers in Debian 12 Bookworm
Solved my own problem: I had to do `apt install
linux-headers-cloud-amd64` instead of `apt install
linux-headers-amd64`
On Thu, Jul 13, 2023 at 2:28 PM Sam Clearman <sam@samclearman.com> wrote:
>
> Hi,
> I'm trying to get a Tesla T4 working under Debian 12.
>
> So far I've tried two approaches:
> 1. Using the Debian provided drivers, per
> https://wiki.debian.org/NvidiaGraphicsDrivers
> 2. Using the nVidia provided drivers installed via runfile, per
> https://docs.nvidia.com/datacenter/tesla/tesla-installation-notes/index.html
>
> For 1 (installing the drivers in the debian nonfree repository),
> everything seems to install fine but the drivers don't load properly.
> Systemctl returns the following:
>
> $ systemctl status systemd-modules-load
> × systemd-modules-load.service - Load Kernel Modules
> Loaded: loaded (/lib/systemd/system/systemd-modules-load.service; static)
> Active: failed (Result: exit-code) since Thu 2023-07-13 21:05:08
> UTC; 18min ago
> Docs: man:systemd-modules-load.service(8)
> man:modules-load.d(5)
> Process: 220 ExecStart=/lib/systemd/systemd-modules-load
> (code=exited, status=1/FAILURE)
> Main PID: 220 (code=exited, status=1/FAILURE)
> CPU: 29ms
>
> Jul 13 21:05:08 localhost systemd-modules-load[226]: modprobe: ERROR:
> could not insert 'nvidia': Invalid argument
> Jul 13 21:05:08 localhost systemd-modules-load[230]: modprobe: FATAL:
> Module nvidia-current-modeset not found in directory
> /lib/modules/6.1.0-10-cloud-amd64
> Jul 13 21:05:08 localhost systemd-modules-load[223]: modprobe: ERROR:
> ../libkmod/libkmod-module.c:1047 command_do() Error running install
> command 'modprobe nvidia ; modprobe -i nvidia-current-modeset ' for m>
> Jul 13 21:05:08 localhost systemd-modules-load[223]: modprobe: ERROR:
> could not insert 'nvidia_modeset': Invalid argument
> Jul 13 21:05:08 localhost systemd-modules-load[232]: modprobe: FATAL:
> Module nvidia-current-drm not found in directory
> /lib/modules/6.1.0-10-cloud-amd64
> Jul 13 21:05:08 localhost systemd-modules-load[220]: Error running
> install command 'modprobe nvidia-modeset ; modprobe -i
> nvidia-current-drm ' for module nvidia_drm: retcode 1
> Jul 13 21:05:08 localhost systemd-modules-load[220]: Failed to insert
> module 'nvidia_drm': Invalid argument
> Jul 13 21:05:08 localhost systemd[1]: systemd-modules-load.service:
> Main process exited, code=exited, status=1/FAILURE
> Jul 13 21:05:08 localhost systemd[1]: systemd-modules-load.service:
> Failed with result 'exit-code'.
> Jul 13 21:05:08 localhost systemd[1]: Failed to start
> systemd-modules-load.service - Load Kernel Modules.
>
> When I try to use the runfile (specifically, this file:
> https://us.download.nvidia.com/tesla/535.54.03/NVIDIA-Linux-x86_64-535.54.03.run)
> it is unable to read the kernel headers that I have installed (if I
> don't specify a location, it says it can't find them, no matter which
> location I specify, it finds something unexpected about what's there).
>
> Any help is appreciated!
>
> PS: Secureboot is disabled, I get the following from mokutil:
> $ mokutil --sb-state
> SecureBoot disabled
Reply to: