[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Announcing cdrskin-0.7.2



Hi,

> As such, your original gfpow table should have had a size of
> 255 elements, not 256.

Yes. It became obvious when i unrolled it.
I had just estimated its size to produce no
buffer overflows.
The operation % 255 makes index 255 impossible.


> The smallest log is 0.  The largest is 254 (not 255).  So
> the range of sums of pairs of logs is [0-508].  So you actually need
> only 509 elements in the expanded table.

Right indeed. When i looked up the highest
value of gflog, i strayed into gfpow.


> 453        if (a == 0 || b == 0)

I will test your proposals.

Some of my own experiments yielded surprising
setbacks. E.g. i replaced
   gfpow[44 - i]
by
   h45[i]
with a suitable constant array h45[].
This was 7 percent slower !
(I suspect a less fortunate cache situation.)


> In burn_rspc_div, you return -1 if the division is by 0.

This has been replaced by a specialized
burn_rspc_div_3() which divides by (x^1+1).
Less ifs, less array lookups, but no speed-up:

/* Divides by polynomial 0x03. Derived from burn_rspc_div() */
static unsigned char burn_rspc_div_3(unsigned char a)
{
        if (a == 0)
                return 0;
        if (gflog[a] >= 25)
                return gfpow[gflog[a] - 25];
        else
                return gfpow[230 + gflog[a]];
}  

I trust in gcc -O2 that it handles the double
lookup of gfpow[a] properly.
The code swallowed far more obvious workload
improvements without showing speed reactions.


> Looking at next_bit(). 

This only had to work properly once to produce
the initial content of array ecma_130_annex_b[].
No need to make it fast or to question its
C language correctness after it produced the
same pseudo-random string as used in the old
code.


> Several functions unconditionally return 1.
> Why not define the function with a return type of "void"?

I can indeed change the type now that it is
clear that there is only one return value.

------------------------------------------------

I see some potential in parallelization.
We have at least 32 bit for exor operations.
There are two neighbored bytes multiplied by
the same byte simultaneously.

But already now a 1000 MHz CPU can easily feed
a 48x CD stream. I am not aware of faster CD
media. And this stuff is for CD only.


Have a nice day :)

Thomas


Reply to: