[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Screen unresponsive



On Tue, 2012-03-13 at 17:04 +0000, Darac Marjal wrote:
On Mon, Mar 05, 2012 at 01:32:17AM -0500, KS wrote:
> On Mon, Mar 5, 2012, at 12:51 AM, KS wrote:
> > Hi all,
> > 
> > The last few days I ahve noticed that when I return to my machine
> > (always ON), the screen doesn't respond. Keyboard (caps lock, num lock)
> > works. I can also ssh to the machine and have noticed that Xorg takes
> > 100% CPU. I couldn't find anything in the Xorg log or syslog files.
> > 
> > Today however, the screen stopped responding after a beep while I was
> > using the machine. Below is what I found on sys log:
> > 
> > Mar  5 00:32:28 gurh kernel: [17901.730462] NVRM: GPU at 0000:01:00.0
> > has fallen off the bus.

This doesn't sound particularly good. It would suggest to me that your
graphics card (the GPU) is no longer attached to the PCI bus. Probably
the best case scenario is that this is a physical problem: Open up your
computer, pull out the card and push it back in, making sure it's fully
seated.

If the problem persists, then it may be that the card is locking up
completely such that the PCI bus THINKS you've pulled it out. You may
find monitoring the output of "nvclock -T" useful.

> 
> Syslog gave the warning again as above!
> 
> So it this just a kernel issue?
> 
> Thanks,
> KS
> 
Hi Darac,

I don't think this is related to HW issue, indeed, I'm experiencing this since some time ago on two different machines. All I can have is the following:
root@laptop:~# head -20 /var/log/syslog
May 31 22:28:59 laptop syslog-ng[1860]: Configuration reload request received, reloading configuration;
May 31 22:28:59 laptop syslog-ng[1860]: EOF on control channel, closing connection;
May 31 22:29:00 laptop anacron[11394]: Job `cron.daily' terminated
May 31 22:29:00 laptop anacron[11394]: Normal exit (1 job run)
May 31 22:49:00 laptop -- MARK --
May 31 23:05:40 laptop kernel: [32915.745040] sdc: detected capacity change from 8019509248 to 0
May 31 23:05:52 laptop kernel: [32927.622139] usb 2-1: USB disconnect, device number 8
May 31 23:08:11 laptop kernel: [33066.384097] NVRM: GPU at 0000:01:00.0 has fallen off the bus.
May 31 23:08:11 laptop kernel: [33066.384102] NVRM: GPU at 0000:01:00.0 has fallen off the bus.
May 31 23:08:11 laptop kernel: [33066.384120] NVRM: os_pci_init_handle: invalid context!
May 31 23:08:11 laptop kernel: [33066.384124] NVRM: os_pci_init_handle: invalid context!
May 31 23:08:11 laptop kernel: [33066.384176] NVRM: os_pci_init_handle: invalid context!
May 31 23:08:11 laptop kernel: [33066.384179] NVRM: os_pci_init_handle: invalid context!
May 31 23:13:06 laptop kernel: [    0.000000] Initializing cgroup subsys cpuset
May 31 23:13:06 laptop kernel: [    0.000000] Initializing cgroup subsys cpu
May 31 23:13:06 laptop kernel: [    0.000000] Linux version 3.2.0-2-686-pae (Debian 3.2.18-1) (debian-kernel@lists.debian.org) (gcc version 4.6.3 (Debian 4.6.3-5) ) #1 SMP Mon May 21 18:24:12 UTC 2012
May 31 23:13:06 laptop kernel: [    0.000000] BIOS-provided physical RAM map:
May 31 23:13:06 laptop kernel: [    0.000000]  BIOS-e820: 0000000000000000 - 000000000009f000 (usable)
May 31 23:13:06 laptop kernel: [    0.000000]  BIOS-e820: 000000000009f000 - 00000000000a0000 (reserved)
May 31 23:13:06 laptop kernel: [    0.000000]  BIOS-e820: 0000000000100000 - 00000000bfe5a800 (usable)
then on Xorg side I have this
[ 30399.257] (II) config/udev: Adding input device ELECOM ELECOM USB mouse with wheel  (/dev/input/mouse2)
[ 30399.257] (II) No input driver specified, ignoring this device.
[ 30399.257] (II) This device may have been added with another device file.
[ 33119.907] [mi] EQ overflowing.  Additional events will be discarded until existing events are processed.
[ 33119.907] 
[ 33119.907] Backtrace:
[ 33120.497] 0: /usr/bin/Xorg (xorg_backtrace+0x49) [0xb7778099]
[ 33120.497] 1: /usr/bin/Xorg (mieqEnqueue+0x22b) [0xb77569ab]
[ 33120.497] 2: /usr/bin/Xorg (0xb75fb000+0x51265) [0xb764c265]
[ 33120.497] 3: /usr/bin/Xorg (xf86PostMotionEventM+0xf9) [0xb7686119]
[ 33120.497] 4: /usr/lib/xorg/modules/input/evdev_drv.so (0xb4255000+0x35ad) [0xb42585ad]
[ 33120.497] 5: /usr/lib/xorg/modules/input/evdev_drv.so (0xb4255000+0x4a2c) [0xb4259a2c]
[ 33120.497] 6: /usr/bin/Xorg (0xb75fb000+0x7a8e1) [0xb76758e1]
[ 33120.497] 7: /usr/bin/Xorg (0xb75fb000+0xa050a) [0xb769b50a]
[ 33120.497] 8: (vdso) (__kernel_sigreturn+0x0) [0xb75dd400]
[ 33120.497] 9: (vdso) (__kernel_vsyscall+0x10) [0xb75dd424]
[ 33120.497] 10: /lib/i386-linux-gnu/i686/cmov/libc.so.6 (__gettimeofday+0x16) [0xb7309916]
[ 33120.497] 11: /usr/lib/xorg/modules/drivers/nvidia_drv.so (0xb486f000+0x62e0d) [0xb48d1e0d]
[ 33120.497] 
[ 33120.497] [mi] These backtraces from mieqEnqueue may point to a culprit higher up the stack.
[ 33120.497] [mi] mieq is *NOT* the cause.  It is a victim.
[ 33120.983] (WW) NVIDIA(0): WAIT (0, 7, 0x8000, 0x00009354, 0x00009354)
[ 33120.983] [mi] Increasing EQ size to 512 to prevent dropped events.
[ 33120.983] [mi] EQ processing has resumed after 31 dropped events.
[ 33120.983] [mi] This may be caused my a misbehaving driver monopolizing the server's resources.
[ 33123.984] (WW) NVIDIA(0): WAIT (0, 6, 0x8000, 0x000098ec, 0x000098ec)
[ 33126.986] (WW) NVIDIA(0): WAIT (2, 7, 0x8000, 0x00009b7c, 0x0000a1bc)
[ 33133.986] (WW) NVIDIA(0): WAIT (1, 7, 0x8000, 0x00009b7c, 0x0000a1bc)
[ 33136.987] (WW) NVIDIA(0): WAIT (2, 7, 0x8000, 0x00009b7c, 0x0000a214)
[ 33143.987] (WW) NVIDIA(0): WAIT (1, 7, 0x8000, 0x00009b7c, 0x0000a214)
[ 33146.988] (WW) NVIDIA(0): WAIT (2, 6, 0x8000, 0x00009b7c, 0x0000b190)
[ 33153.988] (WW) NVIDIA(0): WAIT (1, 6, 0x8000, 0x00009b7c, 0x0000b190)
[ 33156.989] (WW) NVIDIA(0): WAIT (2, 7, 0x8000, 0x00009b7c, 0x0000b1e8)
[ 33163.989] (WW) NVIDIA(0): WAIT (1, 7, 0x8000, 0x00009b7c, 0x0000b1e8)
[ 33166.993] (WW) NVIDIA(0): WAIT (2, 6, 0x8000, 0x00009b7c, 0x0000b780)
[ 33173.993] (WW) NVIDIA(0): WAIT (1, 6, 0x8000, 0x00009b7c, 0x0000b780)
[ 33176.997] (WW) NVIDIA(0): WAIT (2, 7, 0x8000, 0x00009b7c, 0x0000c050)
[ 33183.997] (WW) NVIDIA(0): WAIT (1, 7, 0x8000, 0x00009b7c, 0x0000c050)
[ 33186.999] (WW) NVIDIA(0): WAIT (2, 6, 0x8000, 0x00009b7c, 0x0000cfcc)
[ 33193.999] (WW) NVIDIA(0): WAIT (1, 6, 0x8000, 0x00009b7c, 0x0000cfcc)
[ 33197.000] (WW) NVIDIA(0): WAIT (2, 7, 0x8000, 0x00009b7c, 0x0000d024)
[ 33204.000] (WW) NVIDIA(0): WAIT (1, 7, 0x8000, 0x00009b7c, 0x0000d024)
[ 33207.005] (WW) NVIDIA(0): WAIT (2, 7, 0x8000, 0x00009b7c, 0x0000d8f4)
[ 33214.005] (WW) NVIDIA(0): WAIT (1, 7, 0x8000, 0x00009b7c, 0x0000d8f4)
[ 33220.157] (WW) NVIDIA(0): WAIT (2, 7, 0x8000, 0x00009b7c, 0x0000d94c)
[ 33227.157] (WW) NVIDIA(0): WAIT (1, 7, 0x8000, 0x00009b7c, 0x0000d94c)
[ 33230.158] (WW) NVIDIA(0): WAIT (2, 7, 0x8000, 0x00009b7c, 0x0000d980)
[ 33237.158] (WW) NVIDIA(0): WAIT (1, 7, 0x8000, 0x00009b7c, 0x0000d980)
[ 33240.159] (WW) NVIDIA(0): WAIT (2, 7, 0x8000, 0x00009b7c, 0x0000d9b4)
[ 33247.159] (WW) NVIDIA(0): WAIT (1, 7, 0x8000, 0x00009b7c, 0x0000d9b4)
[ 33250.160] (WW) NVIDIA(0): WAIT (2, 7, 0x8000, 0x00009b7c, 0x0000d9e8)
[ 33257.160] (WW) NVIDIA(0): WAIT (1, 7, 0x8000, 0x00009b7c, 0x0000d9e8)
[ 33260.161] (WW) NVIDIA(0): WAIT (2, 7, 0x8000, 0x00009b7c, 0x0000da1c)
[ 33267.161] (WW) NVIDIA(0): WAIT (1, 7, 0x8000, 0x00009b7c, 0x0000da1c)
[ 33270.162] (WW) NVIDIA(0): WAIT (2, 7, 0x8000, 0x00009b7c, 0x0000da50)
[ 33277.162] (WW) NVIDIA(0): WAIT (1, 7, 0x8000, 0x00009b7c, 0x0000da50)
[ 33280.163] (WW) NVIDIA(0): WAIT (2, 7, 0x8000, 0x00009b7c, 0x0000da84)
[ 33287.163] (WW) NVIDIA(0): WAIT (1, 7, 0x8000, 0x00009b7c, 0x0000da84)
[ 33290.164] (WW) NVIDIA(0): WAIT (2, 7, 0x8000, 0x00009b7c, 0x0000dab8)
[ 33297.164] (WW) NVIDIA(0): WAIT (1, 7, 0x8000, 0x00009b7c, 0x0000dab8)

I'm sure this is either kernel or Xorg issue, as HW failure could not happen on multiple workstations at the same moment
root@laptop:~# lspci 
00:00.0 Host bridge: Intel Corporation Mobile PM965/GM965/GL960 Memory Controller Hub (rev 0c)
00:01.0 PCI bridge: Intel Corporation Mobile PM965/GM965/GL960 PCI Express Root Port (rev 0c)
00:1a.0 USB controller: Intel Corporation 82801H (ICH8 Family) USB UHCI Controller #4 (rev 02)
00:1a.1 USB controller: Intel Corporation 82801H (ICH8 Family) USB UHCI Controller #5 (rev 02)
00:1a.7 USB controller: Intel Corporation 82801H (ICH8 Family) USB2 EHCI Controller #2 (rev 02)
00:1b.0 Audio device: Intel Corporation 82801H (ICH8 Family) HD Audio Controller (rev 02)
00:1c.0 PCI bridge: Intel Corporation 82801H (ICH8 Family) PCI Express Port 1 (rev 02)
00:1c.1 PCI bridge: Intel Corporation 82801H (ICH8 Family) PCI Express Port 2 (rev 02)
00:1c.5 PCI bridge: Intel Corporation 82801H (ICH8 Family) PCI Express Port 6 (rev 02)
00:1d.0 USB controller: Intel Corporation 82801H (ICH8 Family) USB UHCI Controller #1 (rev 02)
00:1d.1 USB controller: Intel Corporation 82801H (ICH8 Family) USB UHCI Controller #2 (rev 02)
00:1d.2 USB controller: Intel Corporation 82801H (ICH8 Family) USB UHCI Controller #3 (rev 02)
00:1d.7 USB controller: Intel Corporation 82801H (ICH8 Family) USB2 EHCI Controller #1 (rev 02)
00:1e.0 PCI bridge: Intel Corporation 82801 Mobile PCI Bridge (rev f2)
00:1f.0 ISA bridge: Intel Corporation 82801HM (ICH8M) LPC Interface Controller (rev 02)
00:1f.1 IDE interface: Intel Corporation 82801HM/HEM (ICH8M/ICH8M-E) IDE Controller (rev 02)
00:1f.2 SATA controller: Intel Corporation 82801HM/HEM (ICH8M/ICH8M-E) SATA Controller [AHCI mode] (rev 02)
00:1f.3 SMBus: Intel Corporation 82801H (ICH8 Family) SMBus Controller (rev 02)
01:00.0 VGA compatible controller: NVIDIA Corporation G86M [Quadro NVS 135M] (rev a1)
03:01.0 CardBus bridge: O2 Micro, Inc. Cardbus bridge (rev 21)
03:01.4 FireWire (IEEE 1394): O2 Micro, Inc. Firewire (IEEE 1394) (rev 02)
09:00.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5755M Gigabit Ethernet PCI Express (rev 02)
0c:00.0 Network controller: Intel Corporation PRO/Wireless 4965 AG or AGN [Kedron] Network Connection (rev 61)

I've also seen this happening more and more when system is loaded and especially when gnome-shell tries to do its special effects when hitting the "Activites" button.

PS: Please keep me in CC as I'm not subscribed to this list.

Cheers,

Attachment: signature.asc
Description: This is a digitally signed message part


Reply to: