[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

tracking down a kernel oops



Hello. I'm responsible for a bunch of debian machines, work-stations
and servers, in an academic/research environment. They co-exist with
OsX machines. Some months ago we were able to get some new hardware
and had wheezy with gnome 3 installed on the new machines. It's been
difficult keeping these machines working day by day -- printing just
stops working, there are apparently random X freezes, and so on; they
seem just generally unstable. This has been overall a frustrating and
disappointing experience.

The worst problem I'm having at present is with a workstation on which
there are regular kernel failures when there is even a moderately
heavy desktop load -- typically when a browser is opened. The screen
goes crazy and the machine becomes unresponsive to input. I can still
ssh in, so this is probably, at bottom, an X server problem. I call
it a kernel failure, though, because the logs show a kernel oops when
this happens. It's hard to know what the ultimate cause of these
events might be, but gdbus seems to be implicated often. I'll attach
the relevant logs below from two of the most recent such events.

The kernel is: 3.2.0-4-686-pae #1 SMP Debian 3.2.51-1 i686 GNU/Linux

I'm guessing (but truly it is only a guess) that the problem is
graphics related. The GPU is:

  [AMD] nee ATI RS880 [Radeon HD 4250]

and we're using the radeon driver, not the proprietary driver:

  xserver-xorg-video-radeon: Installed: 1:6.14.4-8

from stable.  I could upgrade to a newer version of that driver or to
a newer kernel and hope for the best, but I'd really like to have more
of an idea of what problem it is I'm trying to solve.

Is there anyone here who has seen similar problems or who has a
suggestion about how the specific problem might be tracked down? I've
installed xserver-xorg-video-radeon-dbg in the hope of getting more
information at the next unpleasant event, but in the meantime here is
the log.

Thanks to all,

Jim

--------------------------------------------------------------------------------
Oct 21 14:07:01 chung kernel: [1737005.190329] BUG: unable to handle
kernel NULL pointer dereference at 00000008
Oct 21 14:07:01 chung kernel: [1737005.190445] IP: [<c10b3f48>]
vma_address+0x25/0x4c
Oct 21 14:07:01 chung kernel: [1737005.190521] *pdpt =
0000000000000000 *pde = f000eef3f000eef3
Oct 21 14:07:01 chung kernel: [1737005.190607] Oops: 0000 [#2] SMP
Oct 21 14:07:01 chung kernel: [1737005.190663] Modules linked in:
 btrfs crc32c libcrc32c zlib_deflate ufs qnx4 hfsplus hfs minix ntfs
 vfat msdos fat jfs xfs reiserfs ext3 jbd ext2 efivars dm_mod cpuid ppdev
 lp rfcomm bnep cpufreq_userspace cpufreq_stats bluetooth rfkill
 cpufreq_conservative cpufreq_powersave binfmt_misc fuse nfsd nfs nfs_acl
 auth_rpcgss fscache lockd sunrpc loop snd_hda_codec_realtek
 snd_hda_intel snd_hda_codec snd_hwdep snd_pcm snd_page_alloc snd_seq
 snd_seq_device snd_timer radeon snd parport_pc parport ttm
 drm_kms_helper drm power_supply i2c_algo_bit soundcore sp5100_tco k10temp i2c_piix4
 i2c_core evdev powernow_k8 pcspkr mperf shpchp wmi processor button
 thermal_sys ext4 crc16 jbd2 mbcache microcode usbhid hid sg sd_mod
 crc_t10dif ata_generic ohci_hcd r8169 pata_atiixp mii ahci libahci
 ehci_hcd libata usbcore scsi_mod usb_common [last unloaded: scsi_wait_scan]
Oct 21 14:07:01 chung kernel: [1737005.192010]
Oct 21 14:07:01 chung kernel: [1737005.192010] Pid: 30, comm: kswapd0
Tainted: G      D      3.2.0-4-686-pae #1 Debian 3.2.46-1+deb7u1 
MICRO-STAR INTERNATIONAL CO.,LTD MS-7596/880GM-E43 (MS-7596)
Oct 21 14:07:01 chung kernel: [1737005.192010] EIP: 0060:[<c10b3f48>] EFLAGS: 00010246 CPU: 1
Oct 21 14:07:01 chung kernel: [1737005.192010] EIP is at vma_address+0x25/0x4c
Oct 21 14:07:01 chung kernel: [1737005.192010] EAX: 00000000 EBX: cb1158b8 ECX: 00000009 EDX: 00000000
Oct 21 14:07:01 chung kernel: [1737005.192010] ESI: 00000001 EDI:f52abf28 EBP: 00000000 ESP: f502fe7c
Oct 21 14:07:01 chung kernel: [1737005.192010]  DS: 007b ES: 007b FS:00d8 GS: 00e0 SS: 0068
Oct 21 14:07:01 chung kernel: [1737005.192010] Process kswapd0 (pid: 30, ti=f502e000 task=f594f200 task.ti=f502e000)
Oct 21 14:07:01 chung kernel: [1737005.192010] Stack:
Oct 21 14:07:01 chung kernel: [1737005.192010]  f6617c00 00000000 c10b52e5 0000534b eaf3af98 cb1158b8 00000000 f52abf44
Oct 21 14:07:01 chung kernel: [1737005.192010]  00000000 00000020 f502fef4 ffffffff c1411000 00000002 c10a746a 00000001
Oct 21 14:07:01 chung kernel: [1737005.192010]  f6617c14 c1410dc0 f502ff64 00000000 c10a1780 f502ff10 f502fef4 00000020
Oct 21 14:07:01 chung kernel: [1737005.192010] Call Trace:
Oct 21 14:07:01 chung kernel: [1737005.192010]  [<c10b52e5>] ? page_referenced+0xcb/0x204
Oct 21 14:07:01 chung kernel: [1737005.192010]  [<c10a746a>] ? zone_page_state_add+0x12/0x1f
Oct 21 14:07:01 chung kernel: [1737005.192010]  [<c10a1780>] ? shrink_active_list.isra.53+0x19d/0x24f
Oct 21 14:07:01 chung kernel: [1737005.192010]  [<c10a2ad4>] ? kswapd+0x347/0x679
Oct 21 14:07:01 chung kernel: [1737005.192010]  [<c104d954>] ? add_wait_queue+0x30/0x30
Oct 21 14:07:01 chung kernel: [1737005.192010]  [<c10a278d>] ? shrink_zone+0x46d/0x46d
Oct 21 14:07:01 chung kernel: [1737005.192010]  [<c104d4a7>] ? kthread+0x63/0x68
Oct 21 14:07:01 chung kernel: [1737005.192010]  [<c104d444>] ? kthread_worker_fn+0x101/0x101
Oct 21 14:07:01 chung kernel: [1737005.192010]  [<c12c853e>] ? kernel_thread_helper+0x6/0x10
Oct 21 14:07:01 chung kernel: [1737005.192010] Code: 04 e9 f9 ed 20 00
56 53 89 d3 8b 70 08 f6 42 1e 40 74 1d 8b 10 31 c9 80 e6 40 74 03 8b 48 38 b8 00 10 00 00 d3 e0 e8 a5 8a 00 00 <8b> 48 08 d3 e6 2b 73 4c
c1 e6 0c 89 f0 03 43 04 72 0d 3b 43 08
Oct 21 14:07:01 chung kernel: [1737005.192010] EIP: [<c10b3f48>] vma_address+0x25/0x4c SS:ESP 0068:f502fe7c
Oct 21 14:07:01 chung kernel: [1737005.192010] CR2: 0000000000000008
Oct 21 14:07:01 chung kernel: [1737005.274654] ---[ end trace da2f3742f13e2d05 ]---

Oct 21 14:16:24 chung kernel: [1737568.380186] BUG: unable to handle kernel paging request at 006c006d
Oct 21 14:16:24 chung kernel: [1737568.385051] IP: [<c106881a>] acct_collect+0x3e/0x12a
Oct 21 14:16:24 chung kernel: [1737568.392609] *pdpt = 000000000b06c001 *pde = 0000000000000000
Oct 21 14:16:24 chung kernel: [1737568.407993] Oops: 0000 [#3] SMP
Oct 21 14:16:24 chung kernel: [1737568.407993] Modules linked in:
 btrfs crc32c libcrc32c zlib_deflate ufs qnx4 hfsplus hfs minix ntfs
 vfat msdos fat jfs xfs reiserfs ext3 jbd ext2 efivars dm_mod cpuid ppdev
 lp rfcomm bnep cpufreq_userspace cpufreq_stats bluetooth rfkill
 cpufreq_conservative cpufreq_powersave binfmt_misc fuse nfsd nfs nfs_acl
 auth_rpcgss fscache lockd sunrpc loop snd_hda_codec_realtek
 snd_hda_intel snd_hda_codec snd_hwdep snd_pcm snd_page_alloc snd_seq
 snd_seq_device snd_timer radeon snd parport_pc parport ttm
 drm_kms_helper drm power_supply i2c_algo_bit soundcore sp5100_tco k10temp i2c_piix4
 i2c_core evdev powernow_k8 pcspkr mperf shpchp wmi processor button
 thermal_sys ext4 crc16 jbd2 mbcache microcode usbhid hid sg sd_mod
 crc_t10dif ata_generic ohci_hcd r8169 pata_atiixp mii ahci libahci
 ehci_hcd libata usbcore scsi_mod usb_common [last unloaded: scsi_wait_scan]
Oct 21 14:16:24 chung kernel: [1737568.419207]
Oct 21 14:16:24 chung kernel: [1737568.419207] Pid: 24400, comm: gdbus
Tainted: G      D      3.2.0-4-686-pae #1 Debian 3.2.46-1+deb7u1 MICRO-STAR INTERNATIONAL CO.,LTD MS-7596/880GM-E43 (MS-7596)
Oct 21 14:16:24 chung kernel: [1737568.419207] EIP: 0060:[<c106881a>] EFLAGS: 00210206 CPU: 0
Oct 21 14:16:24 chung kernel: [1737568.419207] EIP is at acct_collect+0x3e/0x12a
Oct 21 14:16:24 chung kernel: [1737568.419207] EAX: 006c0065 EBX: 00000100 ECX: 00001c61 EDX: 00000000
Oct 21 14:16:24 chung kernel: [1737568.419207] ESI: f5868a80 EDI: 076e400f EBP: ed4fc0c0 ESP: cb02be34
Oct 21 14:16:24 chung kernel: [1737568.419207]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
Oct 21 14:16:24 chung kernel: [1737568.419207] Process gdbus (pid: 24400, ti=cb02a000 task=ed4fc0c0 task.ti=cb02a000)
Oct 21 14:16:24 chung kernel: [1737568.419207] Stack: Oct 21 14:16:24 chung kernel: [1737568.419207]  00000001 ed4fc0c0
      f70f7580 00000100 f5868a80 c103b19b 00001c61 00000001
Oct 21 14:16:24 chung kernel: [1737568.419207]  00000b9c 00062c4f ed4fc3d4 c1043a54 cb02bef4 ed4fc0c0 ed4fc378 f5868a80
Oct 21 14:16:24 chung kernel: [1737568.419207]  ed4fc0c0 00000100 cb02a000 c103b7ec 00000008 cb02bef4 ed4fc0c0 cb02a000
Oct 21 14:16:24 chung kernel: [1737568.419207] Call Trace:
Oct 21 14:16:24 chung kernel: [1737568.419207]  [<c103b19b>] ? do_exit+0x1bd/0x60c
Oct 21 14:16:24 chung kernel: [1737568.419207]  [<c1043a54>] ? __dequeue_signal+0xf/0xce
Oct 21 14:16:24 chung kernel: [1737568.419207]  [<c103b7ec>] ? do_group_exit+0x5c/0x7f
Oct 21 14:16:24 chung kernel: [1737568.419207]  [<c104564c>] ? get_signal_to_deliver+0x431/0x44d
Oct 21 14:16:24 chung kernel: [1737568.419207]  [<c100b3eb>] ? do_signal+0x2f/0x4c2
Oct 21 14:16:24 chung kernel: [1737568.419207]  [<c10d9084>] ? pollwake+0x4e/0x57
Oct 21 14:16:24 chung kernel: [1737568.419207]  [<c12c36e2>] ? _raw_spin_lock_irq+0x9/0x12
Oct 21 14:16:24 chung kernel: [1737568.419207]  [<c10f7a50>] ? eventfd_ctx_read+0x13d/0x147
Oct 21 14:16:24 chung kernel: [1737568.419207]  [<c1032278>] ? try_to_wake_up+0x155/0x155
Oct 21 14:16:24 chung kernel: [1737568.419207]  [<c10f7a5a>] ? eventfd_ctx_read+0x147/0x147
Oct 21 14:16:24 chung kernel: [1737568.419207]  [<c10cc78b>] ? fsnotify_access+0x48/0x4f
Oct 21 14:16:24 chung kernel: [1737568.419207]  [<c10cd171>] ? vfs_read+0xa1/0xd1
Oct 21 14:16:24 chung kernel: [1737568.419207]  [<c100ba08>] ? do_notify_resume+0x1e/0x65
Oct 21 14:16:24 chung kernel: [1737568.419207]  [<c12c3990>] ? work_notifysig+0x13/0x1b
Oct 21 14:16:24 chung kernel: [1737568.419207] Code: b5 b0 02 00 00 89 14 24 74 42 83 bd f0 00 00 00 00 74 39 8b 85 f0 00 00 00 83 c0 38 8 22 a9 25 00 8b 85 f0 00 00 00 8b 00 eb 09 <03> 78 08 2b 78 04 8b 40 0c 85 c0 75 f3 64 a1 0c 9f 47 c1 8b 80
Oct 21 14:16:24 chung kernel: [1737568.419207] EIP: [<c106881a>] acct_collect+0x3e/0x12a SS:ESP 0068:cb02be34
Oct 21 14:16:24 chung kernel: [1737568.419207] CR2: 00000000006c006d
Oct 21 14:16:24 chung kernel: [1737568.739397] ---[ end trace da2f3742f13e2d06 ]---
Oct 21 14:16:24 chung kernel: [1737568.739403] Fixing recursive fault but reboot is needed!

--------------------------------------------------------------------------------


Reply to: