[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Kernel problems



> If he hasn't rebooted his Win2k box in three years, he has some serious Security
> Issues!!  There have been a huge number of security patches requiring reboot in
> the last three years....

Well, yeah. I thought of that after a few months of our little
rivalry...and he replied that as it wasn't on the internet...it wasn't
an issue. It's only running a small community radio station network with
established staff anyway, so I think he'll be fine.

> I presume that you've tried using a different kernel to solve your problems?

I started out on the 2.4 i386 kernels, then moved on the the 2.4 i586s,
and am now on the 2.6 i586s. Panics with all, which suggested to me a
hardware problem, but the whole rough two week thing, coupled with the 9
month previous uptime, and the IRQ handler part of things makes me think
obscure kernel issue more.

The most significant difference between the old setup and the current
one is the increased number of uses I've managed as my experience grew.
Among other things, I now run OpenLDAP with Samba on the box, and on a
number of occasions (but far from the majority), performing a huge file
operation (say, unzipping a 500mb zip over the network), causes the
panic, but only when I'm due one (> 2 weeks). Works just fine the rest
of the time. Maybe just coincidences, but that's pretty much the most
I/O strain it ever gets put under. Stuff that just pushes the CPU to
~100% doesn't seem to be a problem.

Also, seems my dates were off...our little compo started only 2 years
ago, not 3 ;) One of the previous attempts to deal with this was here:
http://forums.debian.net/viewtopic.php?t=614

I've also taken the liberty of attaching the last dump I bothered
transcribing. Given the prompt for me posting these messages was my most
recent panic, it'll be ~2 weeks before I manage to get a more recent
one. That said, the magic letters 'IRQ' are always present (IIRC)
somewhere in the dump (or at least the bit of it visible on the terminal
when I go have a look why things have stopped working...why doesn't it
allow you to scroll backwards to get the lot :( ).

...though its just occurred to me, is it possible for the RAM to work
fine for ~2 weeks, but then develop bit errors, thus passing the
memtesting, but failing in extended normal use due to their age? Running
memtest for 2 weeks to test that hypothesis seems a bit...excessive, as
I really can't imagine that being the situation.
ksymoops 2.4.9 on i486 2.6.8-2-386.  Options used
     -V (default)
     -k /proc/kallsyms (specified)
     -l /proc/modules (default)
     -o /lib/modules/2.6.8-2-386/ (default)
     -m /boot/System.map-2.6.8-2-386 (default)

Warning (read_ksyms): no kernel symbols in ksyms, is /proc/kallsyms a valid ksyms file?
No modules in ksyms, skipping objects
No ksyms, skipping lsmod
CPU 0
EIP: 0060:[<C01340F9>] Not tainted
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 000010002 (2.6.8-2-386)
eax: c18cf000 ebx: c3ffa000 ecx: c3ffaae0 edx: ccee5000
esi: c3fff160 edi: 00000028 ebp: 0000003c esp: c0d29f20
ds: 007b es: 007b ss: 0068
Stack:  
        0000003c c3fff160 c2914e20 c10a4000 c01341d4 c3fff160 c10a4010 0000003c
        c10a4000 c10a4010 c2914e20 00000282 c013444c c3fff160 c10a4000 c2c1b608
        c0347a48 0000000a c028b2f8 c0157788 c2914e20 c3700e70 c0125583 c2c1b65c
Call trace:
        [<c01341d4>] cache_flusharray+0x5e/0x98
        [<c013444c>] kfree+0x38/0x48
        [<c0157788>] d_callback+0x18/0x29
        [<c0125583>] rcv_do_batch+0xf/0x18
        [<c011c047>] tasklet_action+0x3a/0x59
        [<c011be60>] __do_softirq+0x34/0x73
        [<c011bec1>] do_softirq+0x22/0x26
        [<c0107ee5>] do_IRQ+0xe5/0xf9
        [<c010697c>] common_interrupt+0x18/0x20
Code:   89 02 2b 4b 0c c7 03 00 01 10 00 c7 43 04 00 02 20 00 89 c8


>>EIP; c01340f9 <free_block+3e/bb>   <=====

>>eax; c18cf000 <__crc_sysfs_create_file+2a20d2/300563>
>>ebx; c3ffa000 <__crc_elevator_init+200585/5cd96a>
>>ecx; c3ffaae0 <__crc_elevator_init+201065/5cd96a>
>>edx; ccee5000 <__crc_xfrm_ealg_get_byname+b10fa/52e343>
>>esi; c3fff160 <__crc_elevator_init+2056e5/5cd96a>
>>esp; c0d29f20 <__crc_sysfs_remove_file+2ca163/3f4f00>

Trace; c01341d4 <cache_flusharray+5e/98>
Trace; c013444c <kfree+38/48>
Trace; c0157788 <d_callback+18/29>
Trace; c0125583 <rcu_do_batch+f/18>
Trace; c011c047 <tasklet_action+3a/59>
Trace; c011be60 <__do_softirq+34/73>
Trace; c011bec1 <do_softirq+22/26>
Trace; c0107ee5 <do_IRQ+e5/f9>
Trace; c010697c <common_interrupt+18/20>

Code;  c01340f9 <free_block+3e/bb>
00000000 <_EIP>:
Code;  c01340f9 <free_block+3e/bb>   <=====
   0:   89 02                     mov    %eax,(%edx)   <=====
Code;  c01340fb <free_block+40/bb>
   2:   2b 4b 0c                  sub    0xc(%ebx),%ecx
Code;  c01340fe <free_block+43/bb>
   5:   c7 03 00 01 10 00         movl   $0x100100,(%ebx)
Code;  c0134104 <free_block+49/bb>
   b:   c7 43 04 00 02 20 00      movl   $0x200200,0x4(%ebx)
Code;  c013410b <free_block+50/bb>
  12:   89 c8                     mov    %ecx,%eax

        Kernel Panic Fatal exception in interrupt 

1 warning issued.  Results may not be reliable.

Attachment: signature.asc
Description: OpenPGP digital signature


Reply to: