[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Announcing cdrskin-0.7.2



Hi,

> In burn_rspc_mult:
> Here is a more bullet-proof version of the second test:
> (((int)a - 1) | ((int)b - 1)) < 0

Bullet-proof is more important than speed.
Whatever, all variations had only negative
effects.
(I measure 10 * 800 MB in about 110 seconds.
The negative effects are in the range of 10
to 20 seconds.)


> > static unsigned char burn_rspc_div_3(unsigned char a)
> Given that gfpow is doubled, this code should be faster and simpler:

Stupid me. The -25 case is unneeded, indeed. :))

Nevertheless the overall impact of the division
is quite low. For 2352 bytes it happens 69 times.
There are more than 4000 multiplications which
each need about the same time as a division.


My compliments for your knowledge about code
optimizations. Your ideas are more elaborate
than mine. 
It seems that gcc -O2 works best if one does
not try to squeeze small details. (I would assume
this is a property of the compiler and not so
much of the processor.)
Significant results came from:
- Unrolling gfpow[] to 509 elements.
- Uniting two pairs of loops which shared
  nearly the same index computation.
Anything else had no effect or even hampered -O2.

I made some of the neutral simplifications
nevertheless. It cannot harm to have fewer
C statements in the code.

Besides the remaining opportunities for
parallelisation maybe a less highschoolish
method of solving HxV=0 might lead to better
results.


Have a nice day :)

Thomas


Reply to: