[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#644550: linux-image-2.6.32-5-amd64: indefinite soft lockup on rm



> > Package: linux-2.6
> > Version: 2.6.32-38
> > Severity: important
> > Tags: upstream
> > 
> > the lockup happened on unlinking a large file (several GB) on XFS on fibre
> > channel storage (qla2xxx module).  We experience very simiral trouble on
> > this machine though, using other kernels (from stable, stable-backports and
> > sid as well) and other fs/storage combinations (ext4 on local Dell PERC
> > controller).
> 
> How long did the soft lockup state persist before you rebooted?

14 minutes

> > The full boot-time dmesg output may be found on
> > 
> > http://joni.heaven-industries.com/~egon/tornado-dmesg.txt
> > 
> > The bug is not too easy to trigger, so I hope there's another way to look
> > for it than bisecting 5 years of kernel commits...  On the other hand, I'm
> > of course ready to spend a time on it if it helps to make the kernel better
> > :).
> [...]
> 
> When you say 'not too easy', do you mean that you don't know what
> specific circumstances it occurs in, or that it occurs as part of a long
> or complex sequence of operations?

The former.

> What sort of applications or services is this machine running?
> 
> Given that you can reproduce this with the kernel version in sid, maybe
> we should take this straight to the upstream developers - but it's hard
> to guess which component or maintainer might be responsible.

I'm sorry for taking so long to respond--we tried to reproduce that in the
sid kernel and it didn't crash since then until today (well, the kernel from
squeeze-backports crashed here once in between).  So I thought the BIOS
upgrade may have solved it and didn't want to bother you in that case.  The
today's crash may not even be related to the original, so I should perhaps
open a new bug report...  Anyway, here it goes (edited from netconsole
output):

[313919.501641] BUG: unable to handle kernel paging request at ffff88007f873133
[313919.501703] IP: [<ffffffff810ed6aa>] free_block+0x50/0xfc PGD 1604063 PUD 7f335067 PMD 0 Oops: 0002 [#1] SMP
[313919.501827] CPU 11
[313919.501836] Modules linked in:
 dm_snapshot des_generic ecb md4 hmac nls_utf8 cifs nfs lockd fscache
 auth_rpcgss nfs_acl sunrpc ext4 jbd2 crc16 xfs dm_round_robin
 scsi_dh_emc dm_multipath scsi_dh netconsole configfs snd_pcm
 snd_timer snd soundcore snd_page_alloc joydev tpm_tis pcspkr tpm
 tpm_bios dcdbas i7core_edac evdev psmouse ghes serio_raw edac_core
 acpi_power_meter hed button processor thermal_sys ext3 jbd mbcache
 dm_mod sg sr_mod cdrom sd_mod ata_generic ses crc_t10dif enclosure
 usbhid hid uhci_hcd qla2xxx scsi_transport_fc scsi_tgt ata_piix
 libata ehci_hcd usbcore megaraid_sas scsi_mod bnx2 [last unloaded:
 scsi_wait_scan]
[313919.502423] Pid: 78, comm: kswapd1 Not tainted 3.0.0-2-amd64 #1 Dell Inc. PowerEdge R810/0TT6JF
[313919.502486] RIP: 0010:[<ffffffff810ed6aa>] [<ffffffff810ed6aa>] free_block+0x50/0xfc 
[313919.502540] RSP: 0018:ffff88085e35f9d0  EFLAGS: 00010082 
[313919.502569] RAX: ffff880004013000 RBX: ffff88107f04a340 RCX: ffff880004018000 
[313919.502616] RDX: ffff88087f4761c0 RSI: ffff88007f87312b RDI: ffff880004013200 
[313919.502663] RBP: 000000000000000c R08: dead000000200200 R09: dead000000100100 
[313919.502710] R10: dead000000200200 R11: dead000000100100 R12: 0000000000000003 
[313919.502757] R13: ffff88107f01d5f0 R14: 0000000000000000 R15: ffff880004013200 
[313919.502804] FS:  0000000000000000(0000) GS:ffff88107f4a0000(0000) knlGS:0000000000000000 
[313919.502853] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b 
[313919.502883] CR2: ffff88007f873133 CR3: 0000000001603000 CR4: 00000000000006e0 
[313919.502930] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 
[313919.502977] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 
[313919.503024] Process kswapd1 (pid: 78, threadinfo ffff88085e35e000, task ffff88085e35ae60) 
[313919.503072] Stack: 
[313919.503094]  0000000000000082 ffff88107f01d5c0 ffff88087f4761c0 ffff88107f04a340
[313919.503160]  0000000000000000 0000000000000001 0000000000000000 ffffffff810ed7ea
[313919.503226]  ffff8802f1a7c280 ffff88107f01d5c0 ffff88107f04a340 ffff880004716c90
[313919.503290] Call Trace: 
[313919.503318]  [<ffffffff810ed7ea>] ? __drain_alien_cache+0x94/0xa9 
[313919.503351]  [<ffffffff810ed486>] ? __cache_free.clone.30+0x8f/0x196 
[313919.503383]  [<ffffffff810ed5b2>] ? kmem_cache_free+0x25/0x69 
[313919.503420]  [<ffffffff8111d252>] ? free_buffer_head+0x1f/0x30 
[313919.503452]  [<ffffffff8111d4d8>] ? try_to_free_buffers+0x94/0xa6 
[313919.508629]  [<ffffffffa031ae19>] ? xfs_vm_releasepage+0x82/0x8b [xfs] 
[313919.508668]  [<ffffffff810c3b66>] ? shrink_page_list+0x2ef/0x496 
[313919.508701]  [<ffffffff810c412b>] ? shrink_inactive_list+0x23a/0x391 
[313919.508739]  [<ffffffff811a79ae>] ? radix_tree_gang_lookup_tag+0x7e/0xd2 
[313919.508773]  [<ffffffff810c4932>] ? shrink_zone+0x3ad/0x4ac 
[313919.508810]  [<ffffffff81335b15>] ? _cond_resched+0x9/0x20 
[313919.508841]  [<ffffffff810c55f9>] ? balance_pgdat+0x28e/0x514 
[313919.508873]  [<ffffffff810c5b33>] ? kswapd+0x2b4/0x2cd 
[313919.508906]  [<ffffffff8106023f>] ? wake_up_bit+0x23/0x23 
[313919.508937]  [<ffffffff810c587f>] ? balance_pgdat+0x514/0x514 
[313919.508968]  [<ffffffff810c587f>] ? balance_pgdat+0x514/0x514 
[313919.508999]  [<ffffffff8105fdc7>] ? kthread+0x7a/0x82 
[313919.509032]  [<ffffffff8133d1e4>] ? kernel_thread_helper+0x4/0x10 
[313919.509064]  [<ffffffff8105fd4d>] ? kthread_worker_fn+0x149/0x149 
[313919.509096]  [<ffffffff8133d1e0>] ? gs_change+0x13/0x13 
[313919.509125] Code: 
 4c 89 ff e8 03 e8 ff ff 48 8b 48 08 48 8b 30 49 b9 00 01 10 00 00 00
 ad de 49 b8 00 02 20 00 00 00 ad de 4a 8b 94 f3 68 10 00 00
[313919.509349]  
 89 4e 08 48 89 31 4c 89 08 4c 89 40 08 44 2b 78 18 8b 8b 10
[313919.509486] RIP [<ffffffff810ed6aa>] free_block+0x50/0xfc RSP <ffff88085e35f9d0> CR2: ffff88007f873133 
[313919.509902] ---[ end trace 4e4c1859e166ba0c ]--- 

It was followed by (possibly irrelevant) backtrace from another CPU:

[313936.280974] ------------[ cut here ]------------ 
[313936.281068] WARNING: at /build/buildd-linux-2.6_3.0.0-5-amd64-UzNnAD/linux-2.6-3.0.0/debian/build/source_amd64_none/kernel/watchdog.c:240 watchdog_overflow_callback+0x96/0xa4() 
[313936.281193] Hardware name: PowerEdge R810 
[313936.281259] Watchdog detected hard LOCKUP on cpu 7 
[313936.281314] Modules linked in:
 dm_snapshot des_generic ecb md4 hmac nls_utf8 cifs nfs lockd fscache
 auth_rpcgss nfs_acl sunrpc ext4 jbd2 crc16 xfs dm_round_robin
 scsi_dh_emc dm_multipath scsi_dh netconsole configfs snd_pcm
 snd_timer snd soundcore snd_page_alloc joydev tpm_tis pcspkr tpm
 tpm_bios dcdbas i7core_edac evdev psmouse ghes serio_raw edac_core
 acpi_power_meter hed button processor thermal_sys ext3 jbd mbcache
 dm_mod sg sr_mod cdrom sd_mod ata_generic ses crc_t10dif enclosure
 usbhid hid uhci_hcd qla2xxx scsi_transport_fc scsi_tgt ata_piix
 libata ehci_hcd usbcore megaraid_sas scsi_mod bnx2 [last unloaded:
 scsi_wait_scan]
[313936.284925] Pid: 38550, comm: sky-db Tainted: G      D      3.0.0-2-amd64 #1 
[313936.285001] Call Trace: 
[313936.285066]  <NMI> [<ffffffff81046591>] ? warn_slowpath_common+0x78/0x8c 
[313936.285198]  [<ffffffff81046646>] ? warn_slowpath_fmt+0x45/0x4a 
[313936.285276]  [<ffffffff8100e28d>] ? paravirt_read_tsc+0x5/0x8 
[313936.285347]  [<ffffffff81091b60>] ? watchdog_overflow_callback+0x96/0xa4 
[313936.285424]  [<ffffffff810b1a5c>] ? __perf_event_overflow+0x101/0x198 
[313936.285500]  [<ffffffff8101429b>] ? paravirt_read_msr+0x7/0xa 
[313936.285571]  [<ffffffff810142a9>] ? paravirt_write_msr+0xb/0xe 
[313936.285646]  [<ffffffff81017d48>] ? intel_pmu_handle_irq+0x35d/0x3c0 
[313936.285726]  [<ffffffff813383fa>] ? perf_event_nmi_handler+0x3e/0x87 
[313936.285800]  [<ffffffff81339ee3>] ? notifier_call_chain+0x2e/0x5b 
[313936.285874]  [<ffffffff81339f5f>] ? notify_die+0x2d/0x32 
[313936.285945]  [<ffffffff81337a3c>] ? do_nmi+0x61/0x202 
[313936.286015]  [<ffffffff81337750>] ? nmi+0x20/0x30 
[313936.286089]  [<ffffffff81072135>] ? do_raw_spin_lock+0x15/0x1b 
[313936.286159]  <<EOE>> [<ffffffff810ed471>] ? __cache_free.clone.30+0x7a/0x196 
[313936.286299]  [<ffffffff810ed5b2>] ? kmem_cache_free+0x25/0x69 
[313936.286376]  [<ffffffff8111d252>] ? free_buffer_head+0x1f/0x30 
[313936.286448]  [<ffffffff8111d4d8>] ? try_to_free_buffers+0x94/0xa6 
[313936.286544]  [<ffffffffa031ae19>] ? xfs_vm_releasepage+0x82/0x8b [xfs] 
[313936.286623]  [<ffffffff810c3b66>] ? shrink_page_list+0x2ef/0x496 
[313936.286695]  [<ffffffff810c412b>] ? shrink_inactive_list+0x23a/0x391 
[313936.286767]  [<ffffffff810c4932>] ? shrink_zone+0x3ad/0x4ac 
[313936.286839]  [<ffffffff810c4da0>] ? do_try_to_free_pages+0x106/0x315 
[313936.286918]  [<ffffffff810bc6bf>] ? get_page_from_freelist+0x651/0x697 
[313936.286990]  [<ffffffff810c5235>] ? try_to_free_pages+0xb6/0xf6 
[313936.287062]  [<ffffffff810bcd41>] ? __alloc_pages_nodemask+0x4d9/0x765 
[313936.287137]  [<ffffffff810beef7>] ? page_zone.clone.4+0x21/0x21 
[313936.287214]  [<ffffffff810e7acd>] ? alloc_pages_vma+0xdc/0xe1 
[313936.287292]  [<ffffffff810d263c>] ? handle_pte_fault+0x162/0x79b 
[313936.287364]  [<ffffffff810cfc31>] ? set_pmd+0x5/0x8 
[313936.287434]  [<ffffffff810cfd4c>] ? pte_offset_kernel+0x16/0x36 
[313936.287506]  [<ffffffff810d2ff1>] ? handle_mm_fault+0x1ea/0x22c 
[313936.287578]  [<ffffffff81339e90>] ? do_page_fault+0x2e9/0x30e 
[313936.287651]  [<ffffffff810d7c2e>] ? mmap_region+0x336/0x431 
[313936.287722]  [<ffffffff810d5c0e>] ? get_unmapped_area+0xe4/0x13e 
[313936.287797]  [<ffffffff81337495>] ? page_fault+0x25/0x30 
[313936.287869] ---[ end trace 4e4c1859e166ba0d ]--- 

Here's the last atop sample (2 minutes until the crash):

ATOP - tornado            2011/10/25  03:45:02              600 seconds elapsed
PRC | sys  80.95s | user  67m20s | #proc    201 | #zombie    0 | #exit   4773 |
CPU | sys     14% | user    675% | irq       1% | idle    903% | wait      8% |
cpu | sys      1% | user     71% | irq       0% | idle     27% | cpu002 w  1% |
cpu | sys      1% | user     71% | irq       0% | idle     28% | cpu004 w  0% |
cpu | sys      1% | user     64% | irq       0% | idle     35% | cpu007 w  0% |
cpu | sys      1% | user     59% | irq       0% | idle     39% | cpu006 w  0% |
cpu | sys      1% | user     59% | irq       0% | idle     40% | cpu005 w  0% |
cpu | sys      1% | user     57% | irq       0% | idle     42% | cpu003 w  0% |
cpu | sys      1% | user     52% | irq       0% | idle     45% | cpu001 w  2% |
cpu | sys      1% | user     50% | irq       0% | idle     43% | cpu000 w  5% |
cpu | sys      1% | user     30% | irq       0% | idle     69% | cpu008 w  0% |
cpu | sys      1% | user     30% | irq       0% | idle     70% | cpu011 w  0% |
cpu | sys      0% | user     29% | irq       0% | idle     70% | cpu015 w  0% |
cpu | sys      1% | user     28% | irq       0% | idle     72% | cpu009 w  0% |
cpu | sys      1% | user     24% | irq       0% | idle     75% | cpu012 w  0% |
cpu | sys      1% | user     20% | irq       0% | idle     79% | cpu013 w  0% |
cpu | sys      0% | user     19% | irq       0% | idle     81% | cpu010 w  0% |
cpu | sys      0% | user     11% | irq       0% | idle     89% | cpu014 w  0% |
CPL | avg1   2.57 | avg5    1.94 | avg15   2.15 | csw  9919970 | intr 6679518 |
MEM | tot   63.0G | free    1.3G | cache  52.0G | buff    2.0M | slab    1.2G |
SWP | tot    9.3G | free    9.3G |              | vmcom  13.7G | vmlim  40.8G |
PAG | scan 454328 | stall      0 |              | swin       0 | swout    256 |
DSK |         sda | busy      8% | read    9391 | write  17782 | avio    1 ms |
NET | transport   | tcpi     738 | tcpo     684 | udpi      45 | udpo      45 |
NET | network     | ipi     2033 | ipo      722 | ipfrw      0 | deliv   2032 |
NET | eth2     0% | pcki    3185 | pcko     601 | si    4 Kbps | so    2 Kbps |
NET | lo     ---- | pcki     198 | pcko     198 | si    0 Kbps | so    0 Kbps |

  PID  SYSCPU  USRCPU  VGROW  RGROW  RDDSK  WRDSK  ST EXC S  CPU CMD     1/217 
38629   8.36s   8m29s 74264K 73832K 11916K 301.7M  --   - R  86% sky-db
39009  14.48s   8m22s  -0.1G  -0.1G   556K 337.9M  --   - R  86% sky-db
38674   8.55s   8m26s -13.3M -13.7M  9220K 215.9M  --   - R  86% sky-db
38652   9.33s   8m25s 25112K 25016K  5184K 136.9M  --   - R  86% sky-db
38563   8.28s   8m25s 20124K 19924K  8568K 68804K  --   - R  86% sky-db
38957   8.55s   8m25s 34588K 34112K  7316K 52356K  --   - R  86% sky-db
38576   8.75s   8m24s  5912K  5696K  1172K 14784K  --   - R  85% sky-db
38550  10.55s   8m19s -10.0M -8288K 646.9M 555.2M  --   - R  85% sky-db
 3752   0.10s   0.42s  -640K  -528K  1616K     8K  --   - S   0% sky-hps
 4491   0.02s   0.48s  -640K  -528K  1196K    28K  --   - S   0% sky-hps
 3836   0.06s   0.42s     0K     0K     0K     0K  --   - S   0% sky-hps
 4075   0.45s   0.00s     0K     0K     0K     0K  --   - S   0% flush-254:1
 3910   0.10s   0.34s     0K   -40K     0K     0K  --   - S   0% skytrick
 3828   0.22s   0.22s     0K     0K     0K     0K  --   - S   0% sky-hps
59708   0.14s   0.30s     0K  -120K     0K     0K  --   - S   0% proxy
 4438   0.09s   0.34s     0K  -244K     0K     0K  --   - S   0% sky-db
37660   0.35s   0.07s     0K   -40K     0K     0K  --   - S   0% skytrick
 4445   0.08s   0.34s     0K  -244K     0K     0K  --   - S   0% sky-db
 4459   0.09s   0.33s     0K  -244K     0K     0K  --   - S   0% sky-db
 4184   0.07s   0.35s     0K -1268K     0K     0K  --   - S   0% sky-db
 4452   0.09s   0.32s     0K  -244K     0K     0K  --   - S   0% sky-db
 4474   0.07s   0.34s     0K  -244K     0K     0K  --   - S   0% sky-db
 4481   0.07s   0.30s     0K  -244K     0K     0K  --   - S   0% sky-db

All the sky-db process were messing (through libdb API) with the same big
(57GB) file on the XFS (over LVM2 logical volume over a local HW RAID)
filesystem.  The application is being built locally against stock squeeze
libdb4.8-dev pkg.

I noticed a dramatic drop i the "buff" (2.0M value, see above) about an hour
before the crash--until then it was many hours steadily over 200 megabytes.
In the last hour it was only 1.8-2.0 MB as of the atop log.

The machine had 4 days uptime, the respective boot dmesg output is at

http://joni.heaven-industries.com/~egon/tornado-dmesg-111021.txt

It turned out we have a very similar another machine here which crashed in
the last days too with the outdated BIOS (no crash since the upgrade, but
it's only 4 days since then) running CentOS 5.x (kernel 2.6.18).

To clarify once again: the crash described above happened with a recent BIOS.

Thanks!

-- 
Egon Eckert, Heaven Industries, s.r.o.
E-mail: egon.eckert@heaven-industries.com

         EARTH
     smog  |   bricks
 AIR  --  mud  -- FIRE
soda water |   tequila
         WATER
 -- with thanks to fortune



Reply to: