[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Heavy IO hangs [was: Re: Debian/SPARC[64]]



Le 12/01/2015 15:09, Hermann Lauer a écrit :
on a Sun Enterprise 250 UP Machine I found my heavy IO issue (dd if=dev/zero of=/dev/sd<x> hangs the machine sooner or later) fixed in the Vanilla 3.17 Kernel - see my experiences on http://www.spinics.net/lists/sparclinux/msg13158.html. But later on a scsi issue stopped me with 3.18 Kernels: http://comments.gmane.org/gmane.linux.ports.sparc/20270
Sadly as I struggle to reproduce this issue locally then it's almost
impossible for me to say whether this is something that affects all
SPARC64 kernels or just a specific hardware combination, but I'd be very
interested to hear back as to whether you also experience any lockups
during high I/O during your testing with more recent kernels
(particularly virtio seems to help trigger the issue under emulation).
I never saw any output on the machines where I log the console on those
heavy IO related hangups - nor was there anything in the log files
after a reboot. With squeeze I havn't noticed lookups during long runtimes.

No lookups on Sun SMP Machines running wheezy with 3.17.x kernels so far.

Hope that helps,
  greetings
    Hermann

Hello, I found both machines I own (V240) prone to lockup. I had the freeze dump in the alom console.
It's related to perf counters and plagued the SPARC kernel since 2.6.32.

Cheers
Seb

Here it is :

** Kernel log 1:
 [189130.208596] BUG: NMI Watchdog detected LOCKUP on CPU1, ip 0042f6e8, registers:
[189130.304716] CPU: 1 PID: 9149 Comm: cc1plus Not tainted 3.13-1-sparc64-smp #1 Debian 3.13.10-1
[189130.417946] task: fffffc1038675b60 ti: fffffc1028140000 task.ti: fffffc1028140000
[189130.517448] TSTATE: 0000009911e01604 TPC: 000000000042f6e8 TNPC: 000000000042f140 Y: 00000000    Not tainted
[189130.647846] TPC: <__delay+0x28/0x60>
[189130.695869] g0: fffffc103fe5f278 g1: 000000000042f140 g2: 0000000000000001 g3: 0000000000000000
[189130.811388] g4: fffffc1038675b60 g5: fffffc103bf94000 g6: fffffc1028140000 g7: 0000000000000003
[189130.926907] o0: 0000000000000007 o1: fffffc1028140400 o2: 000000000043da50 o3: 0000000000000000
[189131.042423] o4: 0000000000a88800 o5: fffffc103fe6af81 sp: fffffc103fe6b0e1 ret_pc: 000000000042f6e4
[189131.162518] RPC: <__delay+0x24/0x60>
[189131.210551] l0: 0000000000001000 l1: 0000004411001603 l2: 000000000043da4c l3: 0000000000000400
[189131.326071] l4: 000000000000000e l5: 0000000000000001 l6: 0000000000000000 l7: 0000000000000008
[189131.441588] i0: 0000000000000018 i1: 0000000000000001 i2: 0000000000000000 i3: 0000000000000000
[189131.557107] i4: 00000000009c6510 i5: 000002106cfec67d i6: fffffc103fe6b191 i7: 000000000043da58
[189131.672638] I7: <cheetah_xcall_deliver+0x1d8/0x2c0>
[189131.737816] Call Trace:
[189131.770987]  [000000000043da58] cheetah_xcall_deliver+0x1d8/0x2c0
[189131.852194]  [000000000043d5c4] xcall_deliver+0x124/0x140
[189131.924251]  [0000000000491738] try_to_wake_up+0x2b8/0x300
[189131.997448]  [000000000049e070] autoremove_wake_function+0x10/0x60
[189132.079797]  [000000000049da54] __wake_up_common+0x34/0x80
[189132.152994]  [000000000049dc60] __wake_up+0x20/0x40
[189132.218195]  [00000000004b7608] rcu_process_callbacks+0x448/0x5c0
[189132.299395]  [0000000000467b78] __do_softirq+0xb8/0x260
[189132.369161]  [000000000042b9ac] do_softirq_own_stack+0x2c/0x40
[189132.446935]  [0000000000468090] irq_exit+0x90/0xa0
[189132.510986]  [000000000042fa0c] timer_interrupt+0xac/0xe0
[189132.583044]  [00000000004209d4] tl0_irq14+0x14/0x20
[189132.648234] CPU: 1 PID: 9149 Comm: cc1plus Not tainted 3.13-1-sparc64-smp #1 Debian 3.13.10-1
[189132.761467] Call Trace:
[189132.794639]  [0000000000873f70] perfctr_irq+0x2f0/0x3a0
[189132.864400]  [00000000004209f4] tl0_irq15+0x14/0x20
[189132.929593]  [000000000042f6e8] __delay+0x28/0x60
[189132.992499]  [000000000043da58] cheetah_xcall_deliver+0x1d8/0x2c0
[189133.073706]  [000000000043d5c4] xcall_deliver+0x124/0x140
[189133.145760]  [0000000000491738] try_to_wake_up+0x2b8/0x300
[189133.218960]  [000000000049e070] autoremove_wake_function+0x10/0x60
[189133.301308]  [000000000049da54] __wake_up_common+0x34/0x80
[189133.374508]  [000000000049dc60] __wake_up+0x20/0x40
[189133.439702]  [00000000004b7608] rcu_process_callbacks+0x448/0x5c0
[189133.520906]  [0000000000467b78] __do_softirq+0xb8/0x260
[189133.590676]  [000000000042b9ac] do_softirq_own_stack+0x2c/0x40
[189133.668449]  [0000000000468090] irq_exit+0x90/0xa0
[189133.732497]  [000000000042fa0c] timer_interrupt+0xac/0xe0
[189133.804554]  [00000000004209d4] tl0_irq14+0x14/0x20
[189133.869785] BUG: soft lockup - CPU#1 stuck for 34s! [cc1plus:9149]
[189133.952198] Modules linked in: fuse btrfs raid6_pq zlib_deflate xor ufs qnx4 hfsplus hfs minix ntfs vfat msdos fat jfs xfs crc32c libcrc32c autofs4 target_core_mod configfs binfmt_misc nfsd auth_rpcgss oid_registry nfs_acl nfs lockd fscache sunrpc loop flash ext4 crc16 mbcache jbd2 dm_mod md_mod sg sd_mod crc_t10dif crct10dif_common ata_generic pata_ali sym53c8xx scsi_transport_spi libata ohci_pci ohci_hcd ehci_hcd usbcore usb_common tg3 scsi_mod ptp pps_core libphy
[189134.496617] CPU: 1 PID: 9149 Comm: cc1plus Not tainted 3.13-1-sparc64-smp #1 Debian 3.13.10-1
[189134.609854] task: fffffc1038675b60 ti: fffffc1028140000 task.ti: fffffc1028140000
[189134.709359] TSTATE: 0000000080001606 TPC: 0000000000873fc4 TNPC: 0000000000873fc8 Y: 00000000    Not tainted
[189134.839746] TPC: <perfctr_irq+0x344/0x3a0>
[189134.894639] g0: fffffc1028143f40 g1: ffffffffffffffff g2: 0000000100000100 g3: 00000000ffef0000
[189135.010162] g4: fffffc1038675b60 g5: fffffc103bf94000 g6: fffffc1028140000 g7: 0000000000000001
[189135.125680] o0: 0000000000929000 o1: fffffc1028143ea0 o2: 0000000000a82800 o3: 0000000000003f60
[189135.241197] o4: 0000000000003f40 o5: fffffc103ca012a8 sp: fffffc103fe67741 ret_pc: 0000000000873f9c
[189135.361290] RPC: <perfctr_irq+0x31c/0x3a0>
[189135.416183] l0: 0000000000000001 l1: 0000000000933638 l2: 0000000000000000 l3: 0000000000000000
[189135.531705] l4: 0000000000000000 l5: 0000000000000000 l6: 0000000000000000 l7: 0000000000000000
[189135.647225] i0: 0000000000008000 i1: fffffc103fe6b840 i2: 0000000000000000 i3: 0000000000008001
[189135.762742] i4: 0000000000929000 i5: fffffc103fe6aed1 i6: fffffc103fe6af81 i7: 00000000004209f4
[189135.878259] I7: <tl0_irq15+0x14/0x20>
[189135.927434] Call Trace:
[189135.960604]  [00000000004209f4] tl0_irq15+0x14/0x20
[189136.025799]  [000000000042f6e8] __delay+0x28/0x60
[189136.088704]  [000000000043da58] cheetah_xcall_deliver+0x1d8/0x2c0
[189136.169910]  [000000000043d5c4] xcall_deliver+0x124/0x140
[189136.241965]  [0000000000491738] try_to_wake_up+0x2b8/0x300
[189136.315163]  [000000000049e070] autoremove_wake_function+0x10/0x60
[189136.397512]  [000000000049da54] __wake_up_common+0x34/0x80
[189136.470711]  [000000000049dc60] __wake_up+0x20/0x40
[189136.535908]  [00000000004b7608] rcu_process_callbacks+0x448/0x5c0
[189136.617111]  [0000000000467b78] __do_softirq+0xb8/0x260
[189136.686878]  [000000000042b9ac] do_softirq_own_stack+0x2c/0x40
[189136.764654]  [0000000000468090] irq_exit+0x90/0xa0
[189136.828702]  [000000000042fa0c] timer_interrupt+0xac/0xe0
[189136.900758]  [00000000004209d4] tl0_irq14+0x14/0x20
[189136.965977] Kernel panic - not syncing: Aiee, killing interrupt handler!
[189137.055164] CPU: 1 PID: 9149 Comm: cc1plus Not tainted 3.13-1-sparc64-smp #1 Debian 3.13.10-1
[189137.168394] Call Trace:
[189137.201561]  [0000000000862520] panic+0xb0/0x214
[189137.263331]  [0000000000465a18] do_exit+0x918/0xa00
[189137.328516]  [0000000000873fc4] perfctr_irq+0x344/0x3a0
[189137.398283]  [00000000004209f4] tl0_irq15+0x14/0x20
[189137.463479]  [000000000042f6e8] __delay+0x28/0x60
[189137.526383]  [000000000043da58] cheetah_xcall_deliver+0x1d8/0x2c0
[189137.607590]  [000000000043d5c4] xcall_deliver+0x124/0x140
[189137.679644]  [0000000000491738] try_to_wake_up+0x2b8/0x300
[189137.752843]  [000000000049e070] autoremove_wake_function+0x10/0x60
[189137.835191]  [000000000049da54] __wake_up_common+0x34/0x80
[189137.908391]  [000000000049dc60] __wake_up+0x20/0x40
[189137.973585]  [00000000004b7608] rcu_process_callbacks+0x448/0x5c0
[189138.054791]  [0000000000467b78] __do_softirq+0xb8/0x260
[189138.124558]  [000000000042b9ac] do_softirq_own_stack+0x2c/0x40
[189138.202331]  [0000000000468090] irq_exit+0x90/0xa0
[189138.266381]  [000000000042fa0c] timer_interrupt+0xac/0xe0
[189138.338437] Press Stop-A (L1-A) to return to the boot prom

Kernel log 2:
 [1920691.646156] BUG: NMI Watchdog detected LOCKUP on CPU0, ip 004ba2d0, registers:
[1920691.743450] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.13-1-sparc64-smp #1 Debian 3.13.10-1
[1920691.856677] task: fffffc103b916de0 ti: fffffc103b91c000 task.ti: fffffc103b91c000
[1920691.957323] TSTATE: 0000000880001607 TPC: 00000000004ba2d0 TNPC: 00000000004ba2d4 Y: 00000000    Not tainted
[1920692.088865] TPC: <__getnstimeofday+0x90/0xe0>
[1920692.148325] g0: 00000000000010f2 g1: 4d94426ee27066c7 g2: 00000002fe4fcda3 g3: 000000003b9ac9ff
[1920692.264990] g4: fffffc103b916de0 g5: fffffc103bd94000 g6: fffffc103b91c000 g7: ffffffffc4653600
[1920692.381650] o0: 000014f64983636d o1: 000000000000012c o2: 0000000000000040 o3: 0000000000000000
[1920692.498312] o4: fffffc103c800e50 o5: fffffc10382ac940 sp: fffffc103fe73081 ret_pc: 00000000004ba274
[1920692.619548] RPC: <__getnstimeofday+0x34/0xe0>
[1920692.679022] l0: 00000000009ae080 l1: 0000000000a6d010 l2: 0000000000000000 l3: 000000000000000a
[1920692.795684] l4: 0000000000a29400 l5: 0000000000a29400 l6: 0000000000a6c140 l7: 00000000009e1000
[1920692.912346] i0: fffffc103fe73a90 i1: 000ea06b2c792980 i2: 0000000005210572 i3: 00000000010d4118
[1920693.029007] i4: 0000000000000018 i5: 00000000537d0a7d i6: fffffc103fe73131 i7: 00000000004ba9c4
[1920693.145677] I7: <getnstimeofday+0x4/0x40>
[1920693.200568] Call Trace:
[1920693.234880]  [00000000004ba9c4] getnstimeofday+0x4/0x40
[1920693.305795]  [00000000004baa44] ktime_get_real+0x4/0x60
[1920693.376710]  [000000000077a408] netif_receive_skb+0x48/0xa0
[1920693.452192]  [000000000077ae54] napi_gro_receive+0x74/0xc0
[1920693.526588]  [0000000010064e9c] tg3_poll_work+0x5dc/0xe60 [tg3]
[1920693.606709]  [0000000010076d20] tg3_poll+0x80/0x3e0 [tg3]
[1920693.679901]  [000000000077a764] net_rx_action+0x104/0x200
[1920693.753103]  [0000000000467b78] __do_softirq+0xb8/0x260
[1920693.824018]  [000000000042b9ac] do_softirq_own_stack+0x2c/0x40
[1920693.902928]  [0000000000468090] irq_exit+0x90/0xa0
[1920693.968122]  [000000000042b940] handler_irq+0xc0/0x100
[1920694.037895]  [00000000004208b4] tl0_irq5+0x14/0x20
[1920694.103083]  [000000000042c0e8] arch_cpu_idle+0x88/0xa0
[1920694.173997]  [00000000004ae6d0] cpu_startup_entry+0x1f0/0x260
[1920694.251771]  [00000000009da498] 0x9da498
[1920694.305524]  [0000000040000000] 0x40000000
[1920694.361568] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.13-1-sparc64-smp #1 Debian 3.13.10-1
[1920694.474800] Call Trace:
[1920694.509119]  [0000000000873f70] perfctr_irq+0x2f0/0x3a0
[1920694.580023]  [00000000004209f4] tl0_irq15+0x14/0x20
[1920694.646359]  [00000000004ba2d0] __getnstimeofday+0x90/0xe0
[1920694.720701]  [00000000004ba9c4] getnstimeofday+0x4/0x40
[1920694.791613]  [00000000004baa44] ktime_get_real+0x4/0x60
[1920694.862526]  [000000000077a408] netif_receive_skb+0x48/0xa0
[1920694.938013]  [000000000077ae54] napi_gro_receive+0x74/0xc0
[1920695.012362]  [0000000010064e9c] tg3_poll_work+0x5dc/0xe60 [tg3]
[1920695.092423]  [0000000010076d20] tg3_poll+0x80/0x3e0 [tg3]
[1920695.165617]  [000000000077a764] net_rx_action+0x104/0x200
[1920695.238816]  [0000000000467b78] __do_softirq+0xb8/0x260
[1920695.309729]  [000000000042b9ac] do_softirq_own_stack+0x2c/0x40
[1920695.388644]  [0000000000468090] irq_exit+0x90/0xa0
[1920695.453837]  [000000000042b940] handler_irq+0xc0/0x100
[1920695.523607]  [00000000004208b4] tl0_irq5+0x14/0x20
[1920695.588799]  [000000000042c0e8] arch_cpu_idle+0x88/0xa0
[1920695.659733] Kernel panic - not syncing: Aiee, killing interrupt handler!
[1920695.750069] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.13-1-sparc64-smp #1 Debian 3.13.10-1
[1920695.863300] Call Trace:
[1920695.897608]  [0000000000862520] panic+0xb0/0x214
[1920695.960526]  [0000000000465a18] do_exit+0x918/0xa00
[1920696.026854]  [0000000000873fc4] perfctr_irq+0x344/0x3a0
[1920696.097765]  [00000000004209f4] tl0_irq15+0x14/0x20
[1920696.164101]  [00000000004ba2d0] __getnstimeofday+0x90/0xe0
[1920696.238443]  [00000000004ba9c4] getnstimeofday+0x4/0x40
[1920696.309356]  [00000000004baa44] ktime_get_real+0x4/0x60
[1920696.380268]  [000000000077a408] netif_receive_skb+0x48/0xa0
[1920696.455753]  [000000000077ae54] napi_gro_receive+0x74/0xc0
[1920696.530102]  [0000000010064e9c] tg3_poll_work+0x5dc/0xe60 [tg3]
[1920696.610163]  [0000000010076d20] tg3_poll+0x80/0x3e0 [tg3]
[1920696.683359]  [000000000077a764] net_rx_action+0x104/0x200
[1920696.756558]  [0000000000467b78] __do_softirq+0xb8/0x260
[1920696.827470]  [000000000042b9ac] do_softirq_own_stack+0x2c/0x40
[1920696.906388]  [0000000000468090] irq_exit+0x90/0xa0
[1920696.971581]  [000000000042b940] handler_irq+0xc0/0x100
[1920697.041351] Press Stop-A (L1-A) to return to the boot prom



Reply to: