[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#638231: linux-image-2.6.32-5-686-bigmem: instability when using ext4



Package: linux-image-2.6.32-5-686-bigmem
Version: 2.6.32-35
Severity: important
Tags: squeeze

I have two machines that I upgraded to squeeze and migrated their ext3
filesystems to ext4 due to very high i/o and deep directory hierarchy. These two
machines have been crashing regularly since the ext4 upgrade. The other machines
that I have that are running the squeeze kernel and ext3 are not crashing at
all. 

When I upgraded the two crashy machines to the backports kernel, the crashes
stopped. The crashes were happening at least 2x a week, sometimes much more
frequently. Since the upgrade to the BPO kernel, the machines haven't crashed
once in two months.

Both machines were showing console logs when they crashed that were similar:
either they had nothing on them at all, or they had the following (in some cases
magic-sysrq worked, sometimes it didn't). 

It seems pretty clear to me that there are some instability issues with ext4
in the squeeze kernel. After discussion with Ted Tso on the subject, he indicated
that there were a number of ext4 fixes that have been done that have not been
backported to the squeeze kernel.

What follows are a few of the different things we saw on the console when the
machine hung:

1. 
hoopoe login: [51589.926858] Uniform Multi-Platform E-IDE driver
[51589.943819] ide-cd driver 5.00
[51589.982978] ide-gd driver 1.18
[51590.039277] st: Version 20081215, fixed bufsize 32768, s/g segs 256
[51590.262980] BIOS EDD facility v0.16 2004-Jun-25, 0 devices found
[51590.269224] EDD information not available.
[137993.853645] BIOS EDD facility v0.16 2004-Jun-25, 0 devices found
[137993.860140] EDD information not available.
[138361.949699] INFO: task rdiff-backup:28337 blocked for more than 120 seconds.
[138361.957345] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[138361.965791] rdiff-backup  D f6967700     0 28337  28335 0x00000000
[138361.972772]  e8df2640 00200086 c5808e20 f6967700 f696772c c143de20 c143de20 c1439354
[138361.999573]  e8df27fc c5808e20 00000000 c143de20 ea15f800 c5808e20 ea15f800 c127eb36
[138362.008374]  c5804354 e8df27fc 020e12de c143de20 c143de20 00000000 00000000 00000000
[138362.034480] Call Trace:
[138362.037276]  [<c127eb36>] ? schedule+0x78f/0x7dc
[138362.042407]  [<c127f28f>] ? __mutex_lock_common+0xe8/0x13b
[138362.048233]  [<c127f2f1>] ? __mutex_lock_slowpath+0xf/0x11
[138362.054239]  [<c127f382>] ? mutex_lock+0x17/0x24
[138362.059403]  [<c127f382>] ? mutex_lock+0x17/0x24
[138362.081061]  [<c10d40c5>] ? sync_filesystems+0xf/0xbb
[138362.104668]  [<c10d41a3>] ? sys_sync+0xe/0x29
[138362.109660]  [<c100813b>] ? sysenter_do_call+0x12/0x28

2. 
hoopoe login: [17163.173748] Uniform Multi-Platform E-IDE driver
[17163.187023] ide-cd driver 5.00
[17163.216390] ide-gd driver 1.18
[17163.269327] st: Version 20081215, fixed bufsize 32768, s/g segs 256
[17163.425212] BIOS EDD facility v0.16 2004-Jun-25, 0 devices found
[17163.431412] EDD information not available.
[32426.998664] ata4.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x6 frozen
[32427.005956] ata4.00: failed command: WRITE FPDMA QUEUED
[32427.011225] ata4.00: cmd 61/08:00:57:a2:33/00:00:3a:00:00/40 tag 0 ncq 4096 out
[32427.011227]          res 40/00:01:09:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
[32427.026096] ata4.00: status: { DRDY }
[32427.029818] ata4: hard resetting link
[32427.509533] ata4: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[32427.553582] ata4.00: configured for UDMA/133
[32427.557906] ata4: EH complete


3.
[16003.244636] ata3.00: exception Emask 0x10 SAct 0x0 SErr 0x180000 action 
0x6 frozen
[16003.252347] ata3.00: edma_err_cause=00000020 pp_flags=00000000, 
SError=00180000
[16003.259766] ata3: SError: { 10B8B Dispar }
[16003.263910] ata3.00: failed command: SMART
[16003.268082] ata3.00: cmd b0/d5:01:09:4f:c2/00:00:00:00:00/00 tag 0 pio 
512 in
[16003.268083]          res c0/00:01:09:4f:c2/00:00:00:00:00/00 Emask 0x12 
(ATA bus error)
[16003.283450] ata3.00: status: { Busy }
[16008.800013] ata3: link is slow to respond, please be patient (ready=0)
[67924.593021] ata3.00: exception Emask 0x10 SAct 0x0 SErr 0x180000 action 
0x6 frozen
[67924.600848] ata3.00: edma_err_cause=00000020 pp_flags=00000000, 
SError=00180000
[67924.608344] ata3: SError: { 10B8B Dispar }
[67924.612512] ata3.00: failed command: SMART
[67924.616783] ata3.00: cmd b0/d5:01:09:4f:c2/00:00:00:00:00/00 tag 0 pio 
512 in
[67924.616786]          res c0/00:01:09:4f:c2/00:00:00:00:00/00 Emask 0x12 
(ATA bus error)
[67924.632299] ata3.00: status: { Busy }
[67930.208513] ata3: link is slow to respond, please be patient (ready=0)

4.
 ] Call Trace:
 [33338.612001]  [<c1091433>] ? drain_local_pages+0x0/0xb
 [33338.612001]  [<c1091433>] ? drain_local_pages+0x0/0xb
 [33338.612001]  [<c105969e>] ? smp_call_function+0x19/0x1f
 [33338.612001]  [<c103b677>] ? on_each_cpu+0xc/0x24
 [33338.612001]  [<c1092a1f>] ? __alloc_pages_nodemask+0x36e/0x4d9
 [33338.612001]  [<c1092b96>] ? __get_free_pages+0xc/0x17
 [33338.612001]  [<c1035180>] ? copy_process+0xb7/0xf28
 [33338.612001]  [<c10264db>] ? check_preempt_curr+0x24/0x44
 [33338.612001]  [<c103612b>] ? do_fork+0x13a/0x2bc
 [33338.612001]  [<c10518aa>] ? getnstimeofday+0x4d/0xc9
 [33338.612001]  [<c1006dee>] ? sys_clone+0x21/0x27
 [33338.612001]  [<c100821c>] ? syscall_call+0x7/0xb
 [33404.112001] BUG: soft lockup - CPU#0 stuck for 61s! [clamd:30031]
 [33404.112001] Modules linked in: joydev st ide_gd_mod ide_cd_mod ide_core act_police cls_flow cls_fw cls_u32 sch_htb sch_hfsc sch_ingress sch_sfq xt_time xt_connlimit xt_realm iptable_raw xt_comment xt_recent xt_policy ipt_ULOG ipt_REJECT ipt_REDIRECT ipt_NETMAP ipt_MASQUERADE ipt_ECN ipt_ecn ipt_CLUSTERIP ipt_ah ipt_addrtype nf_nat_tftp nf_nat_snmp_basic nf_nat_sip nf_nat_pptp nf_nat_proto_gre nf_nat_irc nf_nat_h323 nf_nat_ftp nf_nat_amanda ts_kmp nf_conntrack_amanda nf_conntrack_sane nf_conntrack_tftp nf_conntrack_sip nf_conntrack_proto_sctp nf_conntrack_pptp nf_conntrack_proto_gre nf_conntrack_netlink nf_conntrack_netbios_ns nf_conntrack_irc nf_conntrack_h323 nf_conntrack_ftp xt_TPROXY nf_tproxy_core xt_tcpmss xt_pkttype xt_physdev xt_owner xt_NFQUEUE xt_NFLOG nfnetlink_log xt_multiport xt_MARK xt_mark xt_mac xt_limit xt_length xt_iprange xt_helper xt_hashlimit xt_DSCP xt_dscp xt_dccp xt_conntrack xt_CONNMARK xt_connmark xt_CLASSIFY ipt_LOG xt_tcpudp xt_state iptable_nat
  nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack iptable_mangle nfnetlink iptable_filter ip_tables x_tables ext4 jbd2 crc16 snd_pcm snd_timer snd soundcore intel_rng snd_page_alloc rng_core i2c_i801 e7xxx_edac container evdev processor button edac_core i2c_core shpchp pci_hotplug pcspkr serio_raw ext3 jbd mbcache serpent xts gf128mul dm_crypt dm_mod raid1 md_mod sd_mod crc_t10dif sg sr_mod cdrom ata_generic uhci_hcd ehci_hcd sata_mv ata_piix aic79xx libata e1000 scsi_transport_spi thermal floppy usbcore scsi_mod thermal_sys nls_base [last unloaded: scsi_wait_scan]
 [33404.112001] 
 [33404.112001] Pid: 30031, comm: clamd Not tainted (2.6.32-5-686-bigmem #1) X5DPR
 [33404.112001] EIP: 0060:[<c1059677>] EFLAGS: 00000202 CPU: 0
 [33404.112001] EIP is at smp_call_function_many+0x19a/0x1a8
 [33404.112001] EAX: 0000000e EBX: c78046e8 ECX: 00000004 EDX: 00000020
 [33404.112001] ESI: 00000003 EDI: 00000000 EBP: c7804700 ESP: cbb01e6c
 [33404.112001]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
 [33404.112001] CR0: 8005003b CR2: af8f6024 CR3: 36ae0000 CR4: 000006f0
 [33404.112001] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
 [33404.112001] DR6: ffff0ff0 DR7: 00000400
 [33404.112001] Call Trace:
 [33404.112001]  [<c1091433>] ? drain_local_pages+0x0/0xb
 [33404.112001]  [<c1091433>] ? drain_local_pages+0x0/0xb
 [33404.112001]  [<c105969e>] ? smp_call_function+0x19/0x1f
 [33404.112001]  [<c103b677>] ? on_each_cpu+0xc/0x24
 [33404.112001]  [<c1092a1f>] ? __alloc_pages_nodemask+0x36e/0x4d9
 [33404.112001]  [<c1092b96>] ? __get_free_pages+0xc/0x17
 [33404.112001]  [<c1035180>] ? copy_process+0xb7/0xf28
 [33404.112001]  [<c10264db>] ? check_preempt_curr+0x24/0x44
 [33404.112001]  [<c103612b>] ? do_fork+0x13a/0x2bc
 [33404.112001]  [<c10518aa>] ? getnstimeofday+0x4d/0xc9
 [33404.112001]  [<c1006dee>] ? sys_clone+0x21/0x27
 [33404.112001]  [<c100821c>] ? syscall_call+0x7/0xb
 [33469.608001] BUG: soft lockup - CPU#0 stuck for 61s! [clamd:30031]
 [33469.608001] Modules linked in: joydev st ide_gd_mod ide_cd_mod ide_core act_police cls_flow cls_fw cls_u32 sch_htb sch_hfsc sch_ingress sch_sfq xt_time xt_connlimit xt_realm iptable_raw xt_comment xt_recent xt_policy ipt_ULOG ipt_REJECT ipt_REDIRECT ipt_NETMAP ipt_MASQUERADE ipt_ECN ipt_ecn ipt_CLUSTERIP ipt_ah ipt_addrtype nf_nat_tftp nf_nat_snmp_basic nf_nat_sip nf_nat_pptp nf_nat_proto_gre nf_nat_irc nf_nat_h323 nf_nat_ftp nf_nat_amanda ts_kmp nf_conntrack_amanda nf_conntrack_sane nf_conntrack_tftp nf_conntrack_sip nf_conntrack_proto_sctp nf_conntrack_pptp nf_conntrack_proto_gre nf_conntrack_netlink nf_conntrack_netbios_ns nf_conntrack_irc nf_conntrack_h323 nf_conntrack_ftp xt_TPROXY nf_tproxy_core xt_tcpmss xt_pkttype xt_physdev xt_owner xt_NFQUEUE xt_NFLOG nfnetlink_log xt_multiport xt_MARK xt_mark xt_mac xt_limit xt_length xt_iprange xt_helper xt_hashlimit xt_DSCP xt_dscp xt_dccp xt_conntrack xt_CONNMARK xt_connmark xt_CLASSIFY ipt_LOG xt_tcpudp xt_state iptable_nat
  nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack iptable_mangle nfnetlink iptable_filter ip_tables x_tables ext4 jbd2 crc16 snd_pcm snd_timer snd soundcore intel_rng snd_page_alloc rng_core i2c_i801 e7xxx_edac container evdev processor button edac_core i2c_core shpchp pci_hotplug pcspkr serio_raw ext3 jbd mbcache serpent xts gf128mul dm_crypt dm_mod raid1 md_mod sd_mod crc_t10dif sg sr_mod cdrom ata_generic uhci_hcd ehci_hcd sata_mv ata_piix aic79xx libata e1000 scsi_transport_spi thermal floppy usbcore scsi_mod thermal_sys nls_base [last unloaded: scsi_wait_scan]
 [33469.608001] 
 [33469.608001] Pid: 30031, comm: clamd Not tainted (2.6.32-5-686-bigmem #1) X5DPR
 [33469.608001] EIP: 0060:[<c1059677>] EFLAGS: 00000202 CPU: 0
 [33469.608001] EIP is at smp_call_function_many+0x19a/0x1a8
 [33469.608001] EAX: 0000000e EBX: c78046e8 ECX: 00000004 EDX: 00000020
 [33469.608001] ESI: 00000003 EDI: 00000000 EBP: c7804700 ESP: cbb01e6c
 [33469.608001]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
 [33469.608001] CR0: 8005003b CR2: af8f6024 CR3: 36ae0000 CR4: 000006f0
 [33469.608001] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
 [33469.608001] DR6: ffff0ff0 DR7: 00000400
 [33469.608001] Call Trace:
 [33469.608001]  [<c1091433>] ? drain_local_pages+0x0/0xb
 [33469.608001]  [<c1091433>] ? drain_local_pages+0x0/0xb
 [33469.608001]  [<c105969e>] ? smp_call_function+0x19/0x1f
 [33469.608001]  [<c103b677>] ? on_each_cpu+0xc/0x24
 [33469.608001]  [<c1092a1f>] ? __alloc_pages_nodemask+0x36e/0x4d9
 [33469.608001]  [<c1092b96>] ? __get_free_pages+0xc/0x17
 [33469.608001]  [<c1035180>] ? copy_process+0xb7/0xf28
 [33469.608001]  [<c10264db>] ? check_preempt_curr+0x24/0x44
 [33469.608001]  [<c103612b>] ? do_fork+0x13a/0x2bc
 [33469.608001]  [<c10518aa>] ? getnstimeofday+0x4d/0xc9
 [33469.608001]  [<c1006dee>] ? sys_clone+0x21/0x27
 [33469.608001]  [<c100821c>] ? syscall_call+0x7/0xb
 [33535.108001] BUG: soft lockup - CPU#0 stuck for 61s! [clamd:30031]
 [33535.108001] Modules linked in: joydev st ide_gd_mod ide_cd_mod ide_core act_police cls_flow cls_fw cls_u32 sch_htb sch_hfsc sch_ingress sch_sfq xt_time xt_connlimit xt_realm iptable_raw xt_comment xt_recent xt_policy ipt_ULOG ipt_REJECT ipt_REDIRECT ipt_NETMAP ipt_MASQUERADE ipt_ECN ipt_ecn ipt_CLUSTERIP ipt_ah ipt_addrtype nf_nat_tftp nf_nat_snmp_basic nf_nat_sip nf_nat_pptp nf_nat_proto_gre nf_nat_irc nf_nat_h323 nf_nat_ftp nf_nat_amanda ts_kmp nf_conntrack_amanda nf_conntrack_sane nf_conntrack_tftp nf_conntrack_sip nf_conntrack_proto_sctp nf_conntrack_pptp nf_conntrack_proto_gre nf_conntrack_netlink nf_conntrack_netbios_ns nf_conntrack_irc nf_conntrack_h323 nf_conntrack_ftp xt_TPROXY nf_tproxy_core xt_tcpmss xt_pkttype xt_physdev xt_owner xt_NFQUEUE xt_NFLOG nfnetlink_log xt_multiport xt_MARK xt_mark xt_mac xt_limit xt_length xt_iprange xt_helper xt_hashlimit xt_DSCP xt_dscp xt_dccp xt_conntrack xt_CONNMARK xt_connmark xt_CLASSIFY ipt_LOG xt_tcpudp xt_state iptable_nat
  nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack iptable_mangle nfnetlink iptable_filter ip_tables x_tables ext4 jbd2 crc16 snd_pcm snd_timer snd soundcore intel_rng snd_page_alloc rng_core i2c_i801 e7xxx_edac container evdev processor button edac_core i2c_core shpchp pci_hotplug pcspkr serio_raw ext3 jbd mbcache serpent xts gf128mul dm_crypt dm_mod raid1 md_mod sd_mod crc_t10dif sg sr_mod cdrom ata_generic uhci_hcd ehci_hcd sata_mv ata_piix aic79xx libata e1000 scsi_transport_spi thermal floppy usbcore scsi_mod thermal_sys nls_base [last unloaded: scsi_wait_scan]
 [33535.108001] 
 [33535.108001] Pid: 30031, comm: clamd Not tainted (2.6.32-5-686-bigmem #1) X5DPR
 [33535.108001] EIP: 0060:[<c1059677>] EFLAGS: 00000202 CPU: 0
 [33535.108001] EIP is at smp_call_function_many+0x19a/0x1a8
 [33535.108001] EAX: 0000000e EBX: c78046e8 ECX: 00000004 EDX: 00000020
 [33535.108001] ESI: 00000003 EDI: 00000000 EBP: c7804700 ESP: cbb01e6c
 [33535.108001]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
 [33535.108001] CR0: 8005003b CR2: af8f6024 CR3: 36ae0000 CR4: 000006f0
 [33535.108001] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
 [33535.108001] DR6: ffff0ff0 DR7: 00000400
 [33535.108001] Call Trace:
 [33535.108001]  [<c1091433>] ? drain_local_pages+0x0/0xb
 [33535.108001]  [<c1091433>] ? drain_local_pages+0x0/0xb
 [33535.108001]  [<c105969e>] ? smp_call_function+0x19/0x1f
 [33535.108001]  [<c103b677>] ? on_each_cpu+0xc/0x24
 [33535.108001]  [<c1092a1f>] ? __alloc_pages_nodemask+0x36e/0x4d9
 [33535.108001]  [<c1092b96>] ? __get_free_pages+0xc/0x17
 [33535.108001]  [<c1035180>] ? copy_process+0xb7/0xf28
 [33535.108001]  [<c10264db>] ? check_preempt_curr+0x24/0x44
 [33535.108001]  [<c103612b>] ? do_fork+0x13a/0x2bc
 [33535.108001]  [<c10518aa>] ? getnstimeofday+0x4d/0xc9
 [33535.108001]  [<c1006dee>] ? sys_clone+0x21/0x27
 [33535.108001]  [<c100821c>] ? syscall_call+0x7/0xb



-- System Information:
Debian Release: wheezy/sid
  APT prefers unstable
  APT policy: (500, 'unstable'), (1, 'experimental')
Architecture: amd64 (x86_64)

Kernel: Linux 3.0.0-1-amd64 (SMP w/2 CPU cores)
Locale: LANG=en_US.utf8, LC_CTYPE=en_US.utf8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash



Reply to: