[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: mplayer

On Tue, Jul 15, 2003 at 06:33:48PM -0500, Billy Biggs wrote:
>   In my application, I have function pointers like this:
>   extern void (*interpolate_packed422_scanline)( uint8_t *output,
>   I then have multiple implementations (interpolate_blah_[c,mmx,
> mmxext,sse2]) and I set the function pointer at program startup to point
> at the best available one.  When I'm doing a complicated filter or
> something, I can call through the function pointers to the appropriate
> implementation:
>     /* Interpolate scanline. */
>     if( bottom_field ) {
>         quarter_blit_vertical_packed422_scanline( output, curframe - (instride*2), curframe, width );
>     } else {
>         if( i > 1 ) {
>             quarter_blit_vertical_packed422_scanline( output, curframe + (instride*2), curframe, width );
>         } else {
>             blit_packed422_scanline( output, curframe, width );
>         }
>     }

I'd write the examples in terms of this, but the function pointer
isn't used here, so it wouldn't make much sense :P

Anyway, I have a fair idea what you want based on this. This might
have some difficulties integrating into your specific case as written,
but they should be solvable.

>   I don't want to have multiple implementations of this high level code,

Here's the problem. You don't want to have multiple implementations at
this level - but you _do_ want the generated code to be different at
this level. And you need to bridge this gap.

Like most problems with programming, it can be solved by introducing
another layer of abstraction. The preprocessor is a good one.

> but I also don't like having to go through a function pointer for every
> scanline function.

OK, here's one way to get what you want without making the code too
messy. It's not always the best, but it should work out fairly well

Take out the function pointer, and turn it into an inline
function. Give each implementation the same name, and bracket it in a
#if subarch = mmx or whatever (this will need a little fiddling to
make it neat, since #if has to be a simple integer expression) - so
precisely one implementation is included for each platform. This
should be exactly the same as it would be with compile-time cpu

Next comes the important bit.

Construct your higher level function like this:

#define higher_function(A) higher_function_ ## A

higher_function(subarch) (...)

Then compile the file once for each subarch, with a different
-Dsubarch=foo argument each time, and link the whole lot into your
resulting binary. Do this at whatever level you need to make the
performance issues go away.

In an external file, do this:

  if (subarch == mmx)
  else if

And you're done. Run-time CPU detection with no significant cost.

If you have to push things out a long way, there are better approaches
than the preprocessor (which would put macros everywhere), but they
would be overkill here. In the ultimate case, you rebuild the whole
binary once for each subarch, and select between them in a shell
script. The trick is to find a suitable balancing point.

> I don't know how significant the penalty is, so
> maybe it isn't worth bothering,

Find out before touching the code. Always, always, measure before
attempting to optimise. Then measure it again. Then figure out _why_
the performance problem is where it is; by this time you probably know
how to fix it, and the rest is easy. gcc provides plenty of tools to
help with this.

Attempting to do anything else just introduces bugs, and will probably
make the code slower in some cases. I can't remember the last time I
found a performance issue which was where anybody expected it to be.

  .''`.  ** Debian GNU/Linux ** | Andrew Suffield
 : :' :  http://www.debian.org/ | Dept. of Computing,
 `. `'                          | Imperial College,
   `-             -><-          | London, UK

Attachment: pgpgmr6zSs9ct.pgp
Description: PGP signature

Reply to: