[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#538158: BUG: soft lockup - CPU#5 stuck for 62s with 2.6.26-2-686-bigmem kernel



Package: linux-image-2.6.26-2-686-bigmem
Version: 2.6.26-17
Severity: important

This problem is repeatable on two of our Sun X2200 servers (two * quad-core
Opteron 2376 CPUs and 28GB of RAM).  I found a couple of similar bug reports
(#496917 and #536236) , but they are filed agains amd64 kernels.  Ours is
the stock x86 bigmem kernel out of Lenny, so I figured it I'd file
a separate report.

This is unlikely to be a hardware issue, because it shows up on two different 
systems.  Each of them had memtest86+ running for several days before 
deployment.  Right now the machines are running vanilla 2.6.30.1
kernels from kernel.org, compiled with lenny's config-2.6.26-2-686-bigmem,
and the problem is gone.

The problem is that random CPUs intermittently get locked up, with the 
following kernel messages showing repeatedly:

...
[48420.342829] BUG: soft lockup - CPU#5 stuck for 62s! [swapper:0]
[48420.342829] Modules linked in: tcp_diag inet_diag binfmt_misc nfsd
lockd nfs_acl auth_rpcgss sunrpc exportfs ipv6 serio_raw shpchp
psmouse pci_hotplug i2c_nforce2 pcspkr joydev button i2c_core evdev
ext3 jbd mbcache sd_mod usbhid hid ff_memless ide_pci_generic amd74xx
ide_core sata_nv ata_generic tg3 libata scsi_mod ehci_hcd ohci_hcd
dock usbcore thermal processor fan thermal_sys
[48420.342829]
[48420.342829] Pid: 0, comm: swapper Not tainted (2.6.26-2-686-bigmem
#1)
[48420.342829] EIP: 0060:[<c011a124>] EFLAGS: 00000246 CPU: 5
[48420.342829] EIP is at native_safe_halt+0x2/0x3
[48420.342829] EAX: f74be000 EBX: c0107656 ECX: 0f07b000 EDX: 00524d4b
[48420.342829] ESI: 00000005 EDI: 00000000 EBP: 00000000 ESP: f74bffa8
[48420.342829]  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
[48420.342829] CR0: 8005003b CR2: 080f2c58 CR3: 37585000 CR4: 000006f0
[48420.342829] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
[48420.342829] DR6: ffff0ff0 DR7: 00000400
[48420.342829]  [<c0107683>] default_idle+0x2d/0x53
[48420.342829]  [<c01075ce>] cpu_idle+0xab/0xcb
[48420.342829]  =======================
...

The CPU#N part of the error message can be anything from 0 to 7.  And
the process name in square brackets can also be anything from a system 
process to a user-run script.

The machines are pretty much stock Sun X2200 servers with two quad-core 
Opteron 2376 CPUs, 28GB of RAM, and one SATA disk.  Below is the output
of lspci.  Please let me know if you require more information.

trunko:~# lspci
00:00.0 RAM memory: nVidia Corporation MCP55 Memory Controller (rev a2)
00:01.0 ISA bridge: nVidia Corporation MCP55 LPC Bridge (rev a3)
00:01.1 SMBus: nVidia Corporation MCP55 SMBus (rev a3)
00:02.0 USB Controller: nVidia Corporation MCP55 USB Controller (rev a1)
00:02.1 USB Controller: nVidia Corporation MCP55 USB Controller (rev a2)
00:04.0 IDE interface: nVidia Corporation MCP55 IDE (rev a1)
00:05.0 IDE interface: nVidia Corporation MCP55 SATA Controller (rev a3)
00:06.0 PCI bridge: nVidia Corporation MCP55 PCI bridge (rev a2)
00:0a.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3)
00:0b.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3)
00:0c.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3)
00:0d.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3)
00:0f.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3)
00:18.0 Host bridge: Advanced Micro Devices [AMD] Family 10h [Opteron, Athlon64, Sempron] HyperTransport Configuration
00:18.1 Host bridge: Advanced Micro Devices [AMD] Family 10h [Opteron, Athlon64, Sempron] Address Map
00:18.2 Host bridge: Advanced Micro Devices [AMD] Family 10h [Opteron, Athlon64, Sempron] DRAM Controller
00:18.3 Host bridge: Advanced Micro Devices [AMD] Family 10h [Opteron, Athlon64, Sempron] Miscellaneous Control
00:18.4 Host bridge: Advanced Micro Devices [AMD] Family 10h [Opteron, Athlon64, Sempron] Link Control
00:19.0 Host bridge: Advanced Micro Devices [AMD] Family 10h [Opteron, Athlon64, Sempron] HyperTransport Configuration
00:19.1 Host bridge: Advanced Micro Devices [AMD] Family 10h [Opteron, Athlon64, Sempron] Address Map
00:19.2 Host bridge: Advanced Micro Devices [AMD] Family 10h [Opteron, Athlon64, Sempron] DRAM Controller
00:19.3 Host bridge: Advanced Micro Devices [AMD] Family 10h [Opteron, Athlon64, Sempron] Miscellaneous Control
00:19.4 Host bridge: Advanced Micro Devices [AMD] Family 10h [Opteron, Athlon64, Sempron] Link Control
01:05.0 VGA compatible controller: ASPEED Technology, Inc. AST2000
05:00.0 PCI bridge: Broadcom EPB PCI-Express to PCI-X Bridge (rev b5)
06:04.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5715 Gigabit Ethernet (rev a3)
06:04.1 Ethernet controller: Broadcom Corporation NetXtreme BCM5715 Gigabit Ethernet (rev a3)

trunko:~# free
             total       used       free     shared    buffers     cached
Mem:      29120968     954532   28166436          0     304752     274220
-/+ buffers/cache:     375560   28745408
Swap:      2048248          0    2048248

trunko:~# fdisk -l

Disk /dev/sda: 160.0 GB, 160041885696 bytes
255 heads, 63 sectors/track, 19457 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0x00000000

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1               1          13      104391   83  Linux
/dev/sda2              14       19457   156183930    5  Extended
/dev/sda5   *          14          64      409626   83  Linux
/dev/sda6              65         319     2048256   82  Linux swap / Solaris
/dev/sda7             320        1212     7172991   83  Linux
/dev/sda8            1213        1340     1028128+  83  Linux
/dev/sda9            1341        1468     1028128+  83  Linux
/dev/sda10           1469        2361     7172991   83  Linux
/dev/sda11           2362       19457   137323588+  83  Linux

trunko:~# lsmod
Module                  Size  Used by
binfmt_misc             7020  1 
nfsd                  207552  9 
exportfs                3712  1 nfsd
nfs                   220756  10 
lockd                  57984  2 nfsd,nfs
nfs_acl                 2632  2 nfsd,nfs
auth_rpcgss            32752  2 nfsd,nfs
sunrpc                164612  34 nfsd,nfs,lockd,nfs_acl,auth_rpcgss
ipv6                  232800  38 
ipmi_si                34828  0 
ipmi_msghandler        30676  1 ipmi_si
i2c_nforce2             6248  0 
joydev                  8800  0 
serio_raw               4696  0 
psmouse                37468  0 
shpchp                 27108  0 
pci_hotplug            24628  1 shpchp
button                  5120  0 
processor              34600  0 
i2c_core               20880  1 i2c_nforce2
pcspkr                  2096  0 
evdev                   8220  3 
ext3                  107448  7 
jbd                    41072  1 ext3
mbcache                 6984  1 ext3
sd_mod                 23924  9 
ide_pci_generic         3624  0 
amd74xx                 5420  0 
ide_core               87756  2 ide_pci_generic,amd74xx
usbhid                 31452  0 
hid                    36068  1 usbhid
sata_nv                19636  8 
ata_generic             4332  0 
tg3                    94696  0 
libphy                 19512  1 tg3
libata                151032  2 sata_nv,ata_generic
scsi_mod              135076  2 sd_mod,libata
ehci_hcd               30492  0 
ohci_hcd               19880  0 
usbcore               125860  4 usbhid,ehci_hcd,ohci_hcd
thermal                12664  0 
fan                     4032  0 
thermal_sys            13424  3 processor,thermal,fan


-- Package-specific info:

-- System Information:
Debian Release: 5.0.2
  APT prefers stable
  APT policy: (500, 'stable')
Architecture: i386 (i686)

Kernel: Linux 2.6.30.1-i686-bigmem-cdf (SMP w/8 CPU cores)
Locale: LANG=C, LC_CTYPE=C (charmap=ANSI_X3.4-1968)
Shell: /bin/sh linked to /bin/bash

Versions of packages linux-image-2.6.26-2-686-bigmem depends on:
ii  debconf [debconf-2.0]         1.5.24     Debian configuration management sy
ii  initramfs-tools [linux-initra 0.92o      tools for generating an initramfs
ii  module-init-tools             3.4-1      tools for managing Linux kernel mo

Versions of packages linux-image-2.6.26-2-686-bigmem recommends:
ii  libc6-i686                    2.7-18     GNU C Library: Shared libraries [i

Versions of packages linux-image-2.6.26-2-686-bigmem suggests:
ii  grub                       0.97-47lenny2 GRand Unified Bootloader (Legacy v
pn  linux-doc-2.6.26           <none>        (no description available)

-- debconf information:
  linux-image-2.6.26-2-686-bigmem/preinst/overwriting-modules-2.6.26-2-686-bigmem: true
  shared/kernel-image/really-run-bootloader: true
  linux-image-2.6.26-2-686-bigmem/preinst/lilo-has-ramdisk:
  linux-image-2.6.26-2-686-bigmem/postinst/bootloader-test-error-2.6.26-2-686-bigmem:
  linux-image-2.6.26-2-686-bigmem/postinst/depmod-error-2.6.26-2-686-bigmem: false
  linux-image-2.6.26-2-686-bigmem/preinst/initrd-2.6.26-2-686-bigmem:
  linux-image-2.6.26-2-686-bigmem/preinst/abort-overwrite-2.6.26-2-686-bigmem:
  linux-image-2.6.26-2-686-bigmem/preinst/bootloader-initrd-2.6.26-2-686-bigmem: true
  linux-image-2.6.26-2-686-bigmem/postinst/depmod-error-initrd-2.6.26-2-686-bigmem: false
  linux-image-2.6.26-2-686-bigmem/postinst/create-kimage-link-2.6.26-2-686-bigmem: true
  linux-image-2.6.26-2-686-bigmem/preinst/lilo-initrd-2.6.26-2-686-bigmem: true
  linux-image-2.6.26-2-686-bigmem/prerm/would-invalidate-boot-loader-2.6.26-2-686-bigmem: true
  linux-image-2.6.26-2-686-bigmem/preinst/failed-to-move-modules-2.6.26-2-686-bigmem:
  linux-image-2.6.26-2-686-bigmem/prerm/removing-running-kernel-2.6.26-2-686-bigmem: true
  linux-image-2.6.26-2-686-bigmem/postinst/old-dir-initrd-link-2.6.26-2-686-bigmem: true
  linux-image-2.6.26-2-686-bigmem/preinst/elilo-initrd-2.6.26-2-686-bigmem: true
  linux-image-2.6.26-2-686-bigmem/preinst/abort-install-2.6.26-2-686-bigmem:
  linux-image-2.6.26-2-686-bigmem/postinst/old-initrd-link-2.6.26-2-686-bigmem: true
  linux-image-2.6.26-2-686-bigmem/postinst/old-system-map-link-2.6.26-2-686-bigmem: true
  linux-image-2.6.26-2-686-bigmem/postinst/bootloader-error-2.6.26-2-686-bigmem:
  linux-image-2.6.26-2-686-bigmem/postinst/kimage-is-a-directory:



Reply to: