[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: pa-risc/linux abi



Guys,

>> as if they were the same register to avoid unnecessary pipeline stalls.
>> A very easy fix would be to use only the left or right all the time (thereby
>> halving the number of available sp regs).  The way gcc presently does
>> things,
>
>This could be done.  The next passibilty is to disparge using left and
>right in one instruction.  This is somewhat more difficult.  Finally,
>avoiding using the left and right halves of the same register in one
>insn is probably difficult as this is an overall constraint.  I'm not
>sure how the register allocator would handle this.

Not using them in one insn is insufficient anyway.  In what I think ought
to be true in theory, if fr5L is the target of a fp op, you must wait
at least FPU pipe length clock cycles before using fr5R or fr5L.  In practice,
even in assembler, I never got close to peak until I used only one of the
pair anytime it was the output of a muladd, so obviously my theoretical
understanding is incomplete.  However, since I don't know anyone who works
on gcc, and only rarely have gotten them to pay attention when there's problems
with the latest and greatest architecture, I didn't figure it was worth the
time to figure out more once I found the tricks to get my own stuff rolling.
I was able to to successfully use both items of a pair as long as I loaded
them at the same time, and used them read-only.  This seems harder to do in
the framework of an existing compiler, which is why I suggested the easiest
thing would be to simply use only one side for all regs.  It shouldn't require
anything other than telling register allocation to ignore half the registers.
A flag could do it.  I am willing to bet it would speed up almost all gcc's
single precision computations on pa-risc.  To believe this, you have only
to notice that single precision code runs slower than double, even though
double does indeed have half the registers, and must load twice the data
for the same number of flops.

>The calling conventions are essentially the same.  However, there is
>no argument adjustment of argument/return locations in ELF32.  In some
>cases (varargs), float parameters are passed in both floating-point
>and general registers.  The passing of small structs is the same.

Yeah, I'm not concerned with varargs, but I do have to get the caller's
stack frame deciphered and which registers are used to pass which args.
If Linux's are a little bit different, is this documented somewhere?
If not, I can do the usual trick of compiling a C-source kernel using -S
and reverse-engineering where things are, but it's a little easier if
there are docs, particularly as I don't have direct access to linux/parisc.

Thanks,
Clint



Reply to: