[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#368353: "BUG: soft lockup detected on CPU#1!" in __d_lookup during cp -l

Package: linux-image-2.6.15-1-amd64-k8-smp
Version: 2.6.15-8
Severity: important

I am seeing occasional overnight system hangs.  At least one has been caused
by the following kernel problem (syslog entry follows)...

May 21 01:30:39 black kernel:  <3>BUG: soft lockup detected on CPU#1!
May 21 01:30:39 black kernel: CPU 1:
May 21 01:30:39 black kernel: Modules linked in: nfsd exportfs lp button ac battery ipv6 nfs lockd nfs_acl sunrpc sk98lin nls_iso8859
_1 nls_cp437 vfat fat dm_mod sr_mod sbp2 ide_generic ide_disk analog eth1394 snd_mpu401 snd_mpu401_uart gameport snd_intel8x0 snd_ac9
7_codec psmouse snd_ac97_bus parport_pc parport pcspkr snd_rawmidi snd_seq_device serio_raw snd_pcm snd_timer snd soundcore snd_page_
alloc floppy i2c_nforce2 joydev evdev i2c_core ext3 jbd mbcache ide_cd cdrom sd_mod sata_nv libata scsi_mod skge ohci1394 ieee1394 fo
rcedeth generic amd74xx ide_core ohci_hcd ehci_hcd thermal processor fan
May 21 01:30:39 black kernel: Pid: 10739, comm: cp Not tainted 2.6.15-1-amd64-k8-smp #2
May 21 01:30:39 black kernel: RIP: 0010:[__d_lookup+221/254] <ffffffff80187ed8>{__d_lookup+221}
May 21 01:30:39 black kernel: RSP: 0018:ffff810036e0bc78  EFLAGS: 00000286
May 21 01:30:39 black kernel: RAX: ffff81000b53cd08 RBX: ffff81004e05c270 RCX: 0000000000000012
May 21 01:30:39 black kernel: RDX: 0000000000028a0e RSI: 01870610903e8a0e RDI: ffff810024cd5cf8
May 21 01:30:39 black kernel: RBP: ffff810036e0bc38 R08: ffff810037112061 R09: 0000000000000001
May 21 01:30:39 black kernel: R10: 0000000000000004 R11: ffffffff801b91c0 R12: ffffffff88143c92
May 21 01:30:39 black kernel: R13: 0000000000000006 R14: ffff81001bb05720 R15: 0000000000000000
May 21 01:30:39 black kernel: FS:  00002aaaab36b6d0(0000) GS:ffffffff803e3880(0000) knlGS:00000000556d56b0
May 21 01:30:39 black kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
May 21 01:30:39 black kernel: CR2: 0000000001f76028 CR3: 0000000014434000 CR4: 00000000000006e0
May 21 01:30:39 black kernel:
May 21 01:30:39 black kernel: Call Trace:<ffffffff80187ea9>{__d_lookup+174} <ffffffff8017e5b3>{do_lookup+42}
May 21 01:30:39 black kernel:        <ffffffff8017f07f>{__link_path_walk+2413} <ffffffff8017f5b0>{link_path_walk+89}
May 21 01:30:39 black kernel:        <ffffffff80181bca>{sys_link+221} <ffffffff8017fb10>{path_lookup+430}
May 21 01:30:39 black kernel:        <ffffffff8017fc30>{__user_walk+45} <ffffffff80179be1>{vfs_lstat+21}
May 21 01:30:39 black kernel:        <ffffffff80181bca>{sys_link+221} <ffffffff8017a0de>{sys_newlstat+17}
May 21 01:30:39 black kernel:        <ffffffff8010d792>{system_call+126}

The problem seems to have occured during the file copy phase of an "rsnapshot"
run (which happens every night).  During this phase, a recursive "cp -l"
command is used on a very large directory tree (to create a copy tree of
files hardlinked to the original tree).

>From other logs, it appears that there may have been a simultaneous "ls -R"
command being run in another process, although this may not be relevant.

Note that this is a dual-core AMD64 system running an SMP kernel so maybe this 
could be a locking problem.  I have not been able to find any similar report
in the Debian bugs system or reported in LKML.

-- System Information:
Debian Release: testing/unstable
  APT prefers testing
  APT policy: (990, 'testing')
Architecture: amd64 (x86_64)
Shell:  /bin/sh linked to /bin/bash
Kernel: Linux 2.6.15-1-amd64-k8-smp
Locale: LANG=en_IE@euro, LC_CTYPE=en_IE@euro (charmap=ISO-8859-15) (ignored: LC_ALL set to en_IE@euro)

Versions of packages linux-image-2.6.15-1-amd64-k8-smp depends on:
ii  e2fsprogs     1.38+1.39-WIP-2006.04.09-1 ext2 file system utilities and lib
ii  initramfs-too 0.60                       tools for generating an initramfs
ii  module-init-t 3.2.2-2                    tools for managing Linux kernel mo

linux-image-2.6.15-1-amd64-k8-smp recommends no packages.

-- debconf information:
  linux-image-2.6.15-1-amd64-k8-smp/postinst/old-dir-initrd-link-2.6.15-1-amd64-k8-smp: true
  linux-image-2.6.15-1-amd64-k8-smp/postinst/depmod-error-initrd-2.6.15-1-amd64-k8-smp: false
  linux-image-2.6.15-1-amd64-k8-smp/prerm/removing-running-kernel-2.6.15-1-amd64-k8-smp: true
  linux-image-2.6.15-1-amd64-k8-smp/postinst/create-kimage-link-2.6.15-1-amd64-k8-smp: true
  linux-image-2.6.15-1-amd64-k8-smp/postinst/old-initrd-link-2.6.15-1-amd64-k8-smp: true
* linux-image-2.6.15-1-amd64-k8-smp/preinst/already-running-this-2.6.15-1-amd64-k8-smp:
  linux-image-2.6.15-1-amd64-k8-smp/preinst/elilo-initrd-2.6.15-1-amd64-k8-smp: true
  linux-image-2.6.15-1-amd64-k8-smp/prerm/would-invalidate-boot-loader-2.6.15-1-amd64-k8-smp: true
  linux-image-2.6.15-1-amd64-k8-smp/postinst/old-system-map-link-2.6.15-1-amd64-k8-smp: true
  linux-image-2.6.15-1-amd64-k8-smp/preinst/lilo-initrd-2.6.15-1-amd64-k8-smp: true
  linux-image-2.6.15-1-amd64-k8-smp/preinst/bootloader-initrd-2.6.15-1-amd64-k8-smp: true
  linux-image-2.6.15-1-amd64-k8-smp/postinst/depmod-error-2.6.15-1-amd64-k8-smp: false
  linux-image-2.6.15-1-amd64-k8-smp/preinst/overwriting-modules-2.6.15-1-amd64-k8-smp: true

Reply to: