[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: [Linux-ia64] gcc won't inline function returning struct?

The IA-64 ABI says that structures of floats are passed/returned decomposed
into floating point registers.  They ABI calls them homogeneous floating-point
aggregates, or HFA for short.  This also applies to complex types.  Thus your
	typedef struct {
        	float re, im;
	} complex;
is handled by putting RE in one FP register, and IM in the next.  This is
not normal practice, since the structure is 8 bytes, but ends up using 16
bytes worth of register (ignoring long double to simplify the discussion).
This requires special code to decompose/compose HFA arguments and return
values on IA-64 when loading/storing them.  IA-32 does not use this convention,
and thus does not need special code for HFAs.

Because of the old design of the C front end, this special code is problematic.
The C front end generates low level code first, including code to compose/
decompose HFAs, and then tries to do function inlining.  When we inline a
function, we have to optimize away the code that composes/decomposes HFAs,
and this is so difficult that in practice it isn't worthwhile to try.  Thus
we can not inline a function that uses an HFA argument or return value.

The C++ front uses a more recent design that inlines first, and then generates
low level code including the HFA compose/decompose code.  If you compile your
example as C++ code, it will work.

Work is underway to rewrite the C front end to make it work more like the C++
front end, or perhaps even just use the C++ front end for C.  When this work
gets far enough, inlining of HFA functions will work in C.  I just tried your
example with the current FSF development sources, and it did work, so I think
this is fixed as of Alexandre Oliva's 2001-10-05 gcc changes to the C front
end.  I don't know how well it is working at the moment though.  However,
I would expect it to be working fine by the time gcc 3.1 comes out in spring of

Another consideration here is that the IL (Intermediate Language) used by
gcc has no support for representing decomposed structures.  If we did,
then we could get much better optimization of structures by separately
optimizing every structure field as if it was a scalar.  But we don't,
so the only way we can handle decomposed structures as arguments is to
decompose them before the call, and then recompose them in the function
prologue.  This is pretty inefficient, but it does work.  Fixing this will
be a lot of work, and it will likely be a while before anyone tries.


Reply to: