Re: Problems with cvs kernel on Indigo2

On Sat, May 06, 2000 at 01:06:15PM +0200, Klaus Naumann wrote:

> 1. The network driver is completely broken. The output of ifconfig
> shows the eth0 device about 8 times and network isn't working at all.
> ifconfig up & down didn't change anything.

I am not seeing this on my indigo2.

> 2. There are messages in my kernel log like:
> Bug in get_wchan
> Due to my still very limited knowledge about the kernel I don't know
> what's the problem, I only know that this error is generated in the 
> get_wchan function in arch/mips/kernel/process.c

I already had a go in the source.

The problem is that with some "ps" output you want to see the kernel
"function" the process got stuck in when it was scheduled. Now - There
are a couple of possible functions which might be used to schedule
namely all the ones in kernel/sched.c between

void scheduling_functions_start_here(void) { }


void scheduling_functions_end_here(void) { }

As the calling convention for mips just says the return address (Which
is the interesting thing to return in get_wchan() ) is passed in
the "ra" register. Somewhere in the functions between the two markers
above (e.h. asmlinkage void schedule(void)) the "ra" gets pushed on
the stack. Now - get_wchan has to find out WHICH function did
the schedule and depending on the function retrieve the return
address for the specific process from the stack. On some architectures
this is simple or you even can search the stackpage which mips/mipsel
cant do. Where the return address is stored on the stack
for the specific functions can be seen in the gdb.

(gdb) disass schedule
Dump of assembler code for function schedule:
0x8802565c <schedule>:  addiu   $sp,$sp,-48
0x88025660 <schedule+4>:        sw      $ra,40($sp)
0x88025664 <schedule+8>:        sw      $s8,36($sp)
0x88025668 <schedule+12>:       sw      $s4,32($sp)
0x8802566c <schedule+16>:       sw      $s3,28($sp)
0x88025670 <schedule+20>:       sw      $s2,24($sp)
0x88025674 <schedule+24>:       sw      $s1,20($sp)
0x88025678 <schedule+28>:       sw      $s0,16($sp)
0x8802567c <schedule+32>:       lw      $s0,48($gp)
0x88025680 <schedule+36>:       bnez    $s0,0x880256a4 <schedule+72>
0x88025684 <schedule+40>:       move    $s8,$sp
0x88025688 <schedule+44>:       lui     $a0,0x8812
0x8802568c <schedule+48>:       addiu   $a0,$a0,18536
0x88025690 <schedule+52>:       lui     $a1,0x8812
0x88025694 <schedule+56>:       addiu   $a1,$a1,18780

As you can see "ra" gets pushed on the stack at sp + 0x40 

In "get_wchan" you see the following

    204         pc = thread_saved_pc(&p->thread);

Get the programm counter - Means - You can find out which
function did the schedule.

    212                 schedule_frame = ((unsigned long *)p->thread.reg30)[9];

Get the stack pointer.

    213                 return ((unsigned long *)schedule_frame)[16];

And return the 16ths long of the stack -> 16 * 4 -> 64 ...

Now we found the "real" programm counter where the thread scheduled.

Now we try if this is inbetween the scheduling functions which could
mean we havent found the return address on the stack.

    215         if (pc >= first_sched && pc < last_sched) {
    216                 printk(KERN_DEBUG "Bug in %s\n", __FUNCTION__);
    217         }

And there is your "Bug in get_wchan()"

> Is there anyone out there who has solutions for the named problems ?

The "Bug in get_wchan()" is quiet obvious to fix - The day i 
get my Indy ill be able to do a bit more kernel things.

The "Ethernet" is a bit mysterious to me. 

