Bug#665413: BUG: unable to handle kernel paging request in mark_files_ro
On Fri, 23 Mar 2012 19:19:58 -0500, Jonathan Nieder <jrnieder@gmail.com> wrote:
> Daniel Kahn Gillmor wrote:
> > [  574.852044] BUG: unable to handle kernel paging request at b4777dbf
> > [  574.856011] IP: [<c109520e>] mark_files_ro+0x27/0x6f
> > [  574.856011] *pde = 00000000 
> > [  574.856011] Oops: 0002 [#1] 
> > [  574.856011] last sysfs file: /sys/devices/virtual/block/md0/md/metadata_version
> > [  574.856011] Modules linked in: ext3 jbd mbcache raid1 md_mod dm_crypt dm_mod pl2303 usbserial sd_mod crc_t10dif ata_generic i915 tg3 3c59x drm_kms_helper tulip mii libphy uhci_hcd drm i2c_algo_bit snd_intel8x0 snd_ac97_codec ac97_bus snd_pcm snd_timer i2c_i801 ata_piix snd soundcore shpchp parport_pc button processor thermal parport libata i2c_core ehci_hcd rng_core snd_page_alloc evdev psmouse serio_raw pcspkr scsi_mod pci_hotplug usbcore nls_base video thermal_sys output
> > [  574.856011] 
> > [  574.856011] Pid: 6349, comm: dpkg-deb Not tainted (2.6.32-5-486 #1) HP d530 SFF(DG784A)
> [...]
> > [  574.856011] Call Trace:
> > [  574.856011]  [<c105b8ef>] ? __rcu_process_callbacks+0x292/0x352
> > [  574.856011]  [<c105b9be>] ? rcu_process_callbacks+0xf/0x1f
> > [  574.856011]  [<c1026dfa>] ? __do_softirq+0x8e/0x135
> > [  574.856011]  [<c1026ed1>] ? do_softirq+0x30/0x3b
> > [  574.856011]  [<c1026f94>] ? irq_exit+0x25/0x53
> > [  574.856011]  [<c100e963>] ? smp_apic_timer_interrupt+0x60/0x68
> > [  574.856011]  [<c10037f1>] ? apic_timer_interrupt+0x31/0x40
> > [  574.856011] Code: f0 00 00 c3 57 56 89 c6 53 8d 78 74 8b 56 74 eb 54 8b 42 0c 8b 40 0c 0f b7 40 6e 25 00 f0 00 00 3d 00 80 00 00 75 3c 8b 42 14 85 <c0> 74 35 8b 42 1c a8 02 74 2e 8b 5a 08 83 e0 fd 89 42 1c 85 db 
> 
> Is this reproducible?  Is the IP and backtrace the same each time?
Alas, no, it's not strictly reproducible.  With this same kernel, i've
also gotten machine freezes (no console output, hard reset required),
and also this CPU lockup:
[ 5297.844002] BUG: soft lockup - CPU#0 stuck for 61s! [apt-get:914]
[ 5297.844002] Modules linked in: ext3 jbd mbcache raid1 md_mod dm_crypt dm_mod sd_mod crc_t10dif pl2303 usbserial ata_generic i915 drm_kms_helper snd_intel8x0 drm snd_ac97_codec ac97_bus i2c_algo_bit tg3 3c59x snd_pcm mii libphy tulip snd_timer snd i2c_i801 soundcore ata_piix uhci_hcd ehci_hcd shpchp parport_pc video floppy parport pcspkr processor snd_page_alloc thermal button i2c_core libata evdev psmouse serio_raw scsi_mod rng_core pci_hotplug usbcore nls_base thermal_sys output
[ 5297.844002]
[ 5297.844002] Pid: 914, comm: apt-get Not tainted (2.6.32-5-486 #1) HP d530 SFF(DG784A)
[ 5297.844002] EIP: 0060:[<f9123aa2>] EFLAGS: 00000246 CPU: 0
[ 5297.844002] EIP is at walk_page_buffers+0x1a/0x65 [ext3]
[ 5297.844002] EAX: 00000000 EBX: 00000000 ECX: 00000000 EDX: f3c2a7c0
[ 5297.844002] ESI: f3c2a7c0 EDI: 00000000 EBP: f3c2a7c0 ESP: c59a3e14
[ 5297.844002]  DS: 007b ES: 007b FS: 0000 GS: 00e0 SS: 0068
[ 5297.844002] CR0: 8005003b CR2: b7696cbb CR3: 31372000 CR4: 00000690
[ 5297.844002] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
[ 5297.844002] DR6: ffff0ff0 DR7: 00000400
[ 5297.844002] Call Trace:
[ 5297.844002]  [<f9124e49>] ? ext3_ordered_writepage+0x74/0x13c [ext3]
[ 5297.844002]  [<f9123aff>] ? buffer_unmapped+0x0/0xc [ext3]
[ 5297.844002]  [<c10723cb>] ? __writepage+0x8/0x20
[ 5297.844002]  [<c1072957>] ? write_cache_pages+0x1b2/0x2a2
[ 5297.844002]  [<c10723c3>] ? __writepage+0x0/0x20
[ 5297.844002]  [<c1072a61>] ? generic_writepages+0x1a/0x21
[ 5297.844002]  [<c106e549>] ? __filemap_fdatawrite_range+0x63/0x6e
[ 5297.844002]  [<c106e585>] ? filemap_write_and_wait_range+0x31/0x67
[ 5297.844002]  [<c10ac0e8>] ? vfs_fsync_range+0x4b/0x85
[ 5297.844002]  [<c10ac189>] ? vfs_fsync+0x11/0x15
[ 5297.844002]  [<c1085259>] ? sys_msync+0x101/0x164
[ 5297.844002]  [<c1003043>] ? sysenter_do_call+0x12/0x28
Unfortunately, the machine is in a remote location, so performing the
hard reset is difficult; My ultimate goal is also to use this machine
with xen, since it has been running the lenny (and etch before that,
iirc) xen kernel and hypervisor for years with no problem.
Booting the machine into the squeeze xen hypervisor (4.0) and the
squeeze xen kernel causes a separate series of errors (not yet reported
because i haven't had a chance to formulate them cleanly).
Here's an example output of running memtest86+ on the same machine, in
case a demonstration that the RAM isn't faulty would be useful (or if you
can glean more useful info from it than i can)
========================================================================================
      Memtest86+ v4.10      | Pass 40% ###############                          
Pentium 4 (0.13) 2660 MHz   | Test 59% #######################                  
L1 Cache:    8K  20001 MB/s | Test #5  [Block move, 80 moves]                   
L2 Cache:  512K  17387 MB/s | Testing:  188K - 2048M 3808M                      
L3 Cache:       None        | Pattern:                                          
Memory  : 3808M   1645 MB/s |-------------------------------------------------  
Chipset : Intel i848/i865 (ECC : Disabled) - FSB : 133 MHz - PAT : Enabled      
Settings: RAM : 133 MHz (DDR266) / CAS : 2.5-3-3-6 / Dual Channel (128 bits)    
                                                                                
 WallTime   Cached  RsvdMem   MemMap   Cache  ECC  Test  Pass  Errors ECC Errs  
 ---------  ------  -------  --------  -----  ---  ----  ----  ------ --------  
   3:13:03   3808M       0K    e820      on   off   Std     2       0           
 -----------------------------------------------------------------------------  
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
              *****Pass complete, no errors, press Esc to exit*****             
(ESC)Reboot  (c)configuration  (SP)scroll_lock  (CR)scroll_unlock               
========================================================================================
I'm about to try to reboot it again to see if i can get it back to
stability under the lenny hypervisor and kernel, but i'll need to do
that with the rescue 2.6.32-5-486 image as well, so it's possible that
i'll have another backtrace or crash to follow up with in a little bit.
     --dkg
Reply to: