[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#813122: linux-image-4.4.0-trunk-amd64: nouveau driver randomly crashes with message "nouveau 0000:02:00.0: fifo: CHSW_ERROR 00000002"



On Fri, 2016-01-29 at 10:11 -0600, Matt Zagrabelny wrote:
> Package: src:linux
> Version: 4.4-1~exp1
> Severity: important
> 
> Dear Maintainer,
> 
> I recently acquired a new video card:
> 
> PNY NVIDIA Quadro NVS 510 2GB GDDR3 4-Mini DisplayPort Low Profile
> PCI-Express Video Card VCNVS510DVI-PB
> 
> Both with the 4.3 and 4.4 kernels, the nouveau driver crashes at random times with
> messages like:
> 
> nouveau 0000:02:00.0: fifo: CHSW_ERROR 00000002
> 
> flooding the journal. The average rate of messages being written to the
> journal is 951 per second.

Please look for this bug upstream, or open a new bug report, following
the instructions at <https://wiki.freedesktop.org/nouveau/Bugs/>.

> The processes with the most CPU usage are (unsurprisingly):
> 
> PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                            
> 8354 root      20   0       0      0      0 D 100.0  0.0  12:56.34 Xorg                                                               
> 19530 root      20   0   95844  23112   6220 R  99.7  0.2   1035:58 syslog-ng 
> 
> I have been able to recover the system via ssh:
> 
> systemctl stop lightdm.service (this step can take a while and here are
> some messages that correspond with the command)
> 
> Jan 29 09:44:55 achilles kernel: nouveau 0000:02:00.0: fifo: CHSW_ERROR 00000002
> Jan 29 09:44:55 achilles kernel: nouveau 0000:02:00.0: timeout at /build/linux-tEELBQ/linux-4.4/drivers/gpu/drm/nouveau/nvkm/engine/fifo/gpfifogk104.c:47/gk104_fifo_gpfifo_kick()!
> Jan 29 09:44:55 achilles kernel: nouveau 0000:02:00.0: fifo: channel 4 [Xorg[8354]] kick timeout
> Jan 29 09:44:55 achilles kernel: nouveau: Xorg[8354]:00000000:0000a06f: detach gr failed, -16
> Jan 29 09:44:55 achilles kernel: nouveau 0000:02:00.0: fifo: SCHED_ERROR 0d []
> 
> rmmod -f nouveau (this step can also take a while, once it is finished
> the monitors report that there is no signal - which is expected, I
> suppose. There is also a kernel issue with the unloading of the module.)
[...]

Is is not surprising that 'rmmod -f' results in a crash or other bad
behaviour.  It is generally better to reboot than to do that.

Ben.

-- 
Ben Hutchings
Q.  Which is the greater problem in the world today, ignorance or apathy?
A.  I don't know and I couldn't care less.

Attachment: signature.asc
Description: This is a digitally signed message part


Reply to: