Hi
Seem to have a problem with one of my cpus on a ES45, cpu2 seems to be dying, I have had 3 lockups in 2 days
Jul 26 12:26:23 keyzervega kernel: smp_call_function_on_cpu: initial timeout -- trying long wait
Jul 26 12:26:53 keyzervega kernel: lib/kernel_lock.c:229 spinlock stuck in nifd at fffffc00012c65f0(3) owner hald-addon-stor at fffffc00012c65f
0(0) lib/kernel_lock.c:229
Jul 26 12:26:53 keyzervega kernel: lib/kernel_lock.c:229 spinlock stuck in automount at fffffc00012c65f0(1) owner hald-addon-stor at fffffc0001
2c65f0(0) lib/kernel_lock.c:229
Jul 26 12:26:53 keyzervega kernel: Kernel bug at arch/alpha/kernel/smp.c:858
Jul 26 12:26:53 keyzervega kernel: CPU 0 hald-addon-stor(1801): Kernel Bug 1
Jul 26 12:26:53 keyzervega kernel: pc = [<fffffc000101c4ac>] ra = [<fffffc000101c404>] ps = 0000 Not tainted
Jul 26 12:26:53 keyzervega kernel: pc is at smp_call_function_on_cpu+0x220/0x264, ra is at smp_call_function_on_cpu+0x178/0x264
Jul 26 12:26:53 keyzervega kernel: v0 = 0000000000000041 t0 = 0000000000000001 t1 = 0000000000000001
Jul 26 12:26:53 keyzervega kernel: t2 = 0000000100728747 t3 = fffffc0008bbd108 t4 = 000000003b5f2d38
Jul 26 12:26:53 keyzervega kernel: t5 = 0000000000000089 t6 = fffffc03fe78d640 t7 = fffffc03f4118000
Jul 26 12:26:53 keyzervega kernel: a0 = 0000000000000000 a1 = 0000000000000000 a2 = 0000000000000001
Jul 26 12:26:53 keyzervega kernel: a3 = 0000000000000000 a4 = fffffc00012c6038 a5 = 0000000000000000
Jul 26 12:26:53 keyzervega kernel: t8 = 0000000000000200 t9 = 0000000000000020 t10= 0000000000000000
Jul 26 12:26:53 keyzervega kernel: t11= 0000000000000001 pv = fffffc000101ca78 at = 0000000000000000
Jul 26 12:26:53 keyzervega kernel: gp = fffffc00018b2d00 sp = fffffc03f411bde8
Jul 26 12:26:53 keyzervega kernel: Trace:
Jul 26 12:26:53 keyzervega kernel: [<fffffc000108ad04>] invalidate_bdev+0x3c/0x84
Jul 26 12:26:53 keyzervega kernel: [<fffffc000108ba9c>] invalidate_bh_lru+0x0/0x74
Jul 26 12:26:53 keyzervega kernel: [<fffffc000108ba9c>] invalidate_bh_lru+0x0/0x74
Jul 26 12:26:53 keyzervega kernel: [<fffffc0001093098>] kill_bdev+0x24/0x58
Jul 26 12:26:53 keyzervega kernel: [<fffffc0001094020>] blkdev_put+0xa8/0x26c
Jul 26 12:26:53 keyzervega kernel: [<fffffc00010898d8>] __fput+0x80/0x1bc
Jul 26 12:26:53 keyzervega kernel: [<fffffc0001087f64>] filp_close+0xb0/0xd4
Jul 26 12:26:53 keyzervega kernel: [<fffffc000108806c>] sys_close+0xe4/0x114
Jul 26 12:26:53 keyzervega kernel: [<fffffc0001010ff4>] entSys+0xa4/0xc0
I have had a look through and I haven’t seen anything for CPU 2 so I am presuming that it is CPU that is dying the death.
I thought I would isolate cpu 2 from the schedular but when I try placing isolcpus=2 in the kernel parameter it doesn’t seem to make any difference for the schel, the affinity mask for all the processes is still f and less /var/log/dmesg still shows that it is using 4 cpus!
I would prefer to do it in linux so I can test the cpu and not mask it out in srm, which it looks like I am going to have to do.
Is this a know issue is the a resolve, if not where can I log a bug? Where is bug tracking for it ?
Alex