Bug#261893: kernel-image-2.6.6-1-generic: Kernel bug at mm/slab.c:1530
* Jan-Jaap van der Heijden wrote:
With a stock 2.6.6-1-generic kernel it crashes as soon as it
initilizes the SCSI controller.
Can you please try the 2.6.7 packages, and report if they fix your
problem (I suppose so)?
http://people.debian.org/~nobse/kernel-image-2.6.7-alpha/
2.6.7 doesn't lock up anymore when it hits the PCI bus. The CIA issue is
solved.
But the oops is still there. Here's the trace:
===========================================================
ksymoops 2.4.9 on alpha 2.6.7-1-generic. Options used
-V (default)
-k /proc/ksyms (default)
-l /proc/modules (default)
-o /lib/modules/2.6.7-1-generic/ (default)
-m /boot/System.map-2.6.7-1-generic (default)
Warning: You did not tell me where to find symbol information. I will
assume that the log matches the kernel and modules that are running
right now and I'll use the default options above for symbol resolution.
If the current kernel and/or modules do not match the log, you can get
more accurate output by telling me the kernel version and where to find
map, modules, ksyms etc. ksymoops -h explains the options.
Error (regular_file): read_ksyms stat /proc/ksyms failed
ksymoops: No such file or directory
No modules in ksyms, skipping objects
No ksyms, skipping lsmod
4096K Bcache detected; load hit latency 28 cycles, load miss latency 106
cycles
Kernel bug at mm/slab.c:1552
modprobe(200): Kernel Bug 1
pc = [<fffffc00003568b4>] ra = [<fffffffc0023adf4>] ps = 0000 Not
tainted
Using defaults from ksymoops -t elf64-alpha -a alpha
v0 = 0000000000000000 t0 = 0000000000000000 t1 = fffffc002fffc228
t2 = fffffc0000572fb0 t3 = fffffc002fc75330 t4 = fffffc0000600248
t5 = 0000000000000400 t6 = fffffc002f24a000 t7 = fffffc002f2dc000
a0 = 0000000000000000 a1 = fffffffc002458ed a2 = ffffffffbaadf00d
a3 = 0000000000000000 a4 = fffffc002f2dfe28 a5 = fffffc002fc75228
t8 = 0000000000000000 t9 = fffffc000035639c t10= 0000000000000030
t11= 0000000000000040 pv = fffffc0000356874 at = 0000000000000000
gp = fffffc00005f0300 sp = fffffc002f2dfe88
Trace:fffffc000034a0e0 fffffc0000314d14
Code: a0480068 243f1000 2021ff00 44410002 e4400004 00000081 <00000610>
004d0697
PC; fffffc00003568b4 <kmem_cache_destroy+40/1f0> <=====
Trace; fffffc000034a0e0 <sys_init_module+1f0/3a4>
Trace; fffffc0000314d14 <entSys+a4/c0>
Code; fffffc000035689c <kmem_cache_destroy+28/1f0>
0000000000000000 <_PC>:
Code; fffffc000035689c <kmem_cache_destroy+28/1f0>
0: 68 00 48 a0 ldl t1,104(t7)
Code; fffffc00003568a0 <kmem_cache_destroy+2c/1f0>
4: 00 10 3f 24 ldah t0,4096
Code; fffffc00003568a4 <kmem_cache_destroy+30/1f0>
8: 00 ff 21 20 lda t0,-256(t0)
Code; fffffc00003568a8 <kmem_cache_destroy+34/1f0>
c: 02 00 41 44 and t1,t0,t1
Code; fffffc00003568ac <kmem_cache_destroy+38/1f0>
10: 04 00 40 e4 beq t1,24 <_PC+0x24>
Code; fffffc00003568b0 <kmem_cache_destroy+3c/1f0>
14: 81 00 00 00 bugchk
Code; fffffc00003568b4 <kmem_cache_destroy+40/1f0> <=====
18: 10 06 00 00 call_pal 0x610 <=====
Code; fffffc00003568b8 <kmem_cache_destroy+44/1f0>
1c: 97 06 4d 00 call_pal 0x4d0697
SGI XFS with ACLs, security attributes, realtime, large block/inode numbers,
no debug enabled
1 warning and 1 error issued. Results may not be reliable.
===========================================================
The same oops is also in 2.6.8 from
http://people.debian.org/~nobse/kernel-image-2.6.8-alpha/
I wonder the "SGI XFS with ..." means it's XFS that's the problem here.
In that case I might be hitting a variation on the bug that plagued Suse
9.1: http://portal.suse.com/sdb/en/2004/04/91_xfsfix.html
I browsed the code of their "hotfix" and what it does is "Disable BUGs in
the SLAB allocator init. This makes XFS in the 9.1 install kernel work.".
Ugh.
1) I'll have a look at the suse fix that went into the next suse kernel rpm.
See what their real fix is and see if that works for this case.
2) Somewhere in the early 2.6.x releases, XFS switched from using their own
allocator to using the slab allocator. If one of these works (with the
core_cia patch), it would confirm my XFS/slab allocator suspicion. It would
also explain why 2.4.x doesn't have this problem.
Jan-Jaap
Reply to: