[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Reproducible, precompiled .o files: what say policy+gpl?



On Tue, Oct 19, 2004 at 02:49:09AM +0200, Wesley W. Terpstra wrote:
> find -name testing.part.\* -print0 | xargs -0 parchive a -n 16384 testing.par 

After taking a look in the source code for par, I found this in rs.c:
|*| Calculations over a Galois Field, GF(8)

What does that mean? It means there are only 2^8 possible values to evaluate
the polynomial at. So, in fact, the above command will not even work, should
it terminate. However, this is not a problem with RS codes, just par.

I was also wrong about them using Berlekamp-Massey.
They use Gaussian elimination to compute the inverse.
This is complexity O(R^2N) which is even worse (see rs.c:214).
R=number of input files/packets
N=total number of files that were in the output (so N>=R)[B
ie:N>=R=n -> O(n^3) vs. O(n^2) for Berlekamp-Massey or O(nlogn) for mine.

OTOH, since they have at most 2^8 possible 'packets' (including the source), 
that keeps the complexity from killing them. --> 'only' 2^24 operations =)
Also on the plus side, this is the complexity per number of packets, 
not the complexity of the data to be processed.

Although I haven't checked, I would speculate from the simplicity of the
code that they use the normal matrix product algorithm, which means (at
best) O(LR) where L is the total length of the data, R is the number of
files.

As I mentioned, my code has an alphabet around 2^62.
Actually, it's (2^31-1)^2-1 .. but that's almost the same.
Handling potentially so many more packets means you need a new algorithm.

Still, par is very cool and I will liberally lift usability from it.
... and, of course, use it for unfair comparisons in my paper. =)

-- 
Wesley W. Terpstra



Reply to: