[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: pa-risc/linux abi



Carlos,

>jda knows the cause of this, we discussed this onece, that single
>prcision was slower on pa than double precision because single still has
>to pass the entire parameter in the two registers, the HP compiler
>doesn't have to do that and it can crank out two per set. The issues
>here is that we would need more patterns in gcc and a change in the
>framework. Maybe jda can make a comment here.

The performance slowdown is unrelated to parameter passing.  It's what I
mailed in a bit ago: single precision registers can use both left and right
portions of the FPU regs (and thus single precision has twice the number of
available regs as double prec), but both left and right halves must be scheduled
as if they were the same register to avoid unnecessary pipeline stalls.
A very easy fix would be to use only the left or right all the time (thereby
halving the number of available sp regs).  The way gcc presently does things,
it has access to double the number of registers, but takes a gigantic
performance hit because the false dependencies between these 'L' and 'R'
portions of the same register cause pipeline stalls at random locations
throughout the code.  In the worse case, gcc's bad scheduling would
theoretically make the single precision performance (1/pipelength) slower
than double, but it only happens when it randomly schedules these halves
too close, so in practice it seems to halve the performance of well-pipelined
single precision fp code.  Code would be much faster with half the registers
without the pipeline stalls.

The HP compiler does use both left and right, but it schedules their
use to avoid the stalls.  It doesn't appear to really use all the available
registers, though, probably because their scheduling algorithm is not
perfect . . . .

>> Debian HPPA is 32-bit ELF.
>> HPUX is 32-bit SOM. Different binary format.
>> I'm pretty sure the calling convention is the same for both.
>
>No, it is not the same. We cannot generate "pass fpregs in general
>registers" as HPUX does, this would require tons of code in the linker
>and stubs into/outof to move parameters around to the right places.
>Aside from that major bit, we also probably differ in how we pass small
>structs to functions. SOM was "ported" (and I use the word loosely) to
>become ELF32 for hppa. We had to make things up as it went along.

HP does not pass fpargs in general regs, it passes them in the fpregs.
If the Linux stack frame and register passing protocols are different
than the ones HP uses, is there any document that describes them?  I looked
at the LSB (no mention of parisc), and the only thing at the parisc-linux
site was the HP doc I referenced in my original mail . . .

>I don't clearly understand what an "assembly kernel" does? Can you
>please enlighten me as to it's proper functioning within the context of
>a toolchain?

It's not a toolchain.  It's a performance-critical kernel written in 
assembly to avoid the gcc register scheduling bug mentioned above.  This
assembly kernel is called from a C code, so the question becomes what is
the parameter passing conventions used under Linux.  If they are the same
as under HP-UX (i.e., gcc passes args in the same way under both HP-UX and
Linux), my kernel will work under both OSes.  If Linux changed things
(an example of this is OS X & Linux have differing ABIs on the same hardware)
you have adapt the assembly to read in parameters differently depending on
the target OS.

I'm sorry for all the confusion, this is not a critical problem, and that's
why I originally e-mailed it only to Camm.  I will finish the HP-UX support,
and then either get access to a Linux/parisc machine, or have someone test
the install there.  If the kernel works, I'll know the ABIs are the same, and
if it dies, I'll either get access to a linux/parisc ABI doc, or a machine
so I can figure it out empirically using gcc, or I'll only support HP-UX . . .

Thanks,
Clint



Reply to: