[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Reproducible, precompiled .o files: what say policy+gpl?



On Mon, Oct 18, 2004 at 05:15:20PM -0400, Glenn Maynard wrote:
> Isn't this just what PAR and PAR2 do (in conjunction with a file splitter)?

Thanks for the pointer to this project; I didn't know about it.
However, to answer your question: no.

PAR2 uses Reed-Solomon codes, my project also uses a variant of
Reed-Solomon. However, the result is completely different.

PAR2 aims to correct corruption of data (as well as loss).
Most differently, it also trades disk overhead for speed.

My program does not attempt to correct corruption (though there are
algorithms which can do so, just not efficiently). If corruption 
resilience is needed, you would have to add a CRC to each packet.

[ For those who know how Reed-Solomon works, I am applying Reed-Solomon 
over the entire file as one block -- using an alphabet of size ~2^62. 
Normally Reed-Solomon breaks the file into blocks of a fixed size and 
adds the correction these small blocks. ]

For example, with par2, I have 
-rw-r--r--  1 terpstra terpstra 627103 2004-10-19 00:18 foo.pdf

I ran: par2 create foo.pdf
It generated:
-rw-r--r--  1 terpstra terpstra  40600 2004-10-19 00:19 foo.pdf.par2
-rw-r--r--  1 terpstra terpstra  40980 2004-10-19 00:19 foo.pdf.vol000+01.par2
-rw-r--r--  1 terpstra terpstra  81860 2004-10-19 00:19 foo.pdf.vol001+02.par2
-rw-r--r--  1 terpstra terpstra 123120 2004-10-19 00:19 foo.pdf.vol003+04.par2
-rw-r--r--  1 terpstra terpstra 165140 2004-10-19 00:19 foo.pdf.vol007+08.par2
-rw-r--r--  1 terpstra terpstra 208680 2004-10-19 00:19 foo.pdf.vol015+16.par2
-rw-r--r--  1 terpstra terpstra 255260 2004-10-19 00:19 foo.pdf.vol031+32.par2
-rw-r--r--  1 terpstra terpstra 257540 2004-10-19 00:19 foo.pdf.vol063+38.par2

I then removed foo.pdf and tried: par2 recover foo.pdf.par2
...
You have 0 out of 2010 data blocks available.
You have 101 recovery blocks available.
Repair is not possible.
You need 1909 more recovery blocks to be able to repair.

To which, I say, wtf!
The data I have totals
du -sb .
1173540 .

That's even more than foo.pdf!
terpstra@muffin:~/x/y$ du -bs ../foo.pdf
627103  foo.pdf

... and yet par2 tells me I need 1909 more (I have 101).
That means I would need 18* more information in order to recover foo.pdf!
I have to admit, I am surprised by this. I would have expected a better
ratio, even with very old techniques.

My program is entirely different.
Let's say it's called rsgt (I haven't decided on a name yet).

You run: rsgt 1024 <x> foo.pdf
You then get <x> files all of size 1024 (regardless of the size of foo.pdf).

As long as you have enough files that the total size is the same or larger
than the size of foo.pdf, you can recover foo.pdf.

You can create nearly 2^62 different such files, and start generating them
from any starting point. This means you could essentially send a never
ending stream of (network sized) packets which people could 'tap into'.
After getting any subsequence of packets, as long as you get enough so that
their total is the file size, you are done.

This what is different about my program: it has zero space overhead (well,
it has an 8 byte per packet header, but it doesn't depend on packet length).
What is new in terms of research is that I have an algorithm which can do
the decoding quickly.

rsgt aims not to add 'parity' as I think 'par'2 is intended to suggest.
Rather, it transforms a file into a stream of user-defined-size packets
which goes on practically forever. ANY of the packets will do to get back
the original file, as long as the sum is the same size.

-- 
Wesley W. Terpstra



Reply to: