[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: vlc 0.2.82 debs available



Josh Huber writes:
> Michel Lanners <mlan@cpu.lu> writes:

>> One area is IDCT, where vlc contains some altivec code for MacOS
>> X/Darwin. Unfortunately, it's a C extension, and that is not
>> supported by our tools.

Running MacOS X:  MacOS-gcc -O2 -S foo.c
Running Linux:    cp /mac/src/whatever/foo.s /usr/src/whatever/foo.S

> From the looks of that gprof output, this is the place to improve.
> What's good is that XVideo support got rid of that annoying
> (expensive) YUV transformation, though.

Does it do a good job?

1. scale by exponential, with linear top and bottom
2. do a matrix multiply
3. scale by exponential, with linear top and bottom
4. truncate as needed

Ideally one would take advantage of sub-pixel addressing.
(the R, G, and B on an LCD screen are physically separate)
For those with a CRT, compensate for beam spreading and
the inability to quickly change intensity.

In case it isn't obvious, you can suck up every CPU cycle
on the fastest CPU ever made. Lots of stuff should be done
in a linear color space or perceptually uniform color space.

Scaling suggestion: only scale by 1:1, 1:2, 2:1, 3:2, 2:3,
3:4, and 4:3. Crop the edges or fill space with a neutral
color (average of a sample of the pixels?) as needed.

> The C code looks pretty well
> optimized, and my ppc asm skills aren't to the level of fine tuning
> mpeg decompression code...I wonder if we could improve this
> performance?

Sure. PowerPC asm isn't hard. The basics, for any modern chip really,
are that you interleave your operations to avoid placing an instruction
right after one that it depends on or conflicts with. In pseudo-asm:

f1 = f2 + f3
// here, avoid using floating-point and strongly avoid using f1

Don't forget to take advantage of the CTR register and the three
complicated rotate-shift-mask instructions (rlwimi, etc.).
There is a permute instruction for the AltiVec unit that seems
pretty useful.

>> Also, most recent grafix chipsets provide some form of acceleration
>> for DVD playing, mostly in three areas: YUV to RGB and scaling (used
>> by Xvideo; available with ATI chipsets); IDCT in hardware (ATI Rage
>> 128; not supported); and motion compemsation (ATI; not upported).
>
> What exactly is IDCT anyway?  I wonder if there's anything else to use
> on the mach64 for speedups?

Inverse Discrete Cosine Transform (or Integer?)

>> So I guess it is worthwhile to hack some Altivec acceleration into
>> these tools; that should help a lot.
>
> Yes, this would be good, but it still wouldn't help the poor people
> with mach64. :)

Why not? As long as you have a G4, you should be all set.



Reply to: