[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Reproducible, precompiled .o files: what say policy+gpl?



On Mon, Oct 18, 2004 at 07:45:39PM -0400, Glenn Maynard wrote:
> On Tue, Oct 19, 2004 at 12:59:42AM +0200, Wesley W. Terpstra wrote:
> > To which, I say, wtf!
> You're using it wrong.

Well thank goodness, b/c otherwise that would be really awful. :)
This gives me a great source to compare my algorithm's speed against.
Thank you again for showing me it, and straighting out my misuse.

> [instructions which I followed and worked]
> This is exactly what you describe.

Yep! That's what Reed-Solomon codes can do.
I never claimed to invent them; as I said earlier, my work was on creating
an algorithm which could decode >5GB of data broken into packets (which for
me means small packets of network size).

I suggest you try:
dd if=/dev/urandom of=testing bs=16 count=1048576
split -a 3 -b 1024 testing testing.part.
find -name testing.part.\* -print0 | xargs -0 parchive a -n 16384 testing.par 

... now, twiddle you thumbs.
After about a minute, you will see it process the first file
Another 30s will get you the second.
You have 16382 to go.

If you strace it, you will see there are no system calls being performed at
this time, so the slowness is not due to the large directory. Also, the
processor is fully loaded.

This is only for _encoding_ which is much easier than decoding.
You also are only doing it over 16MB. 
My algorithm can handle 3 orders of magnitude more data much faster.

I would wager that par is using the Berklekamp-Masey algorithm for decoding; 
this is the most popular algorithm for RS codes at the moment. 
This algorithm has time O(n^2) for n 'parts'.
My algorithm has time O(nlogn) and a not too terrible constant.

Perhaps I should make my program 'par' command-line compatible!
OTOH, when you have so many small files it is not convenient.

Thank you very very much for bringing this implementation to my attention.

-- 
Wesley W. Terpstra



Reply to: