[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Reproducible, precompiled .o files: what say policy+gpl?



On Mon, Oct 18, 2004 at 11:05:16PM -0400, Glenn Maynard wrote:
> > find -name testing.part.\* -print0 | xargs -0 parchive a -n 16384 testing.par 
> 
> You're splitting into parts which are far too small.

Yes, it's too small for par.
It's also clear that it's too small for usenet use.

However, my program is not intended to solve usenet problems.
My applications are all on the level of packet switching networks.

Besides, who ever complains when something is faster than they need?

Or as my first computer science prof once said:
  Insertion sort is fine for most tasks.
  Sometimes you need quicksort.

> It's not designed for thousands of tiny parts

No, but mine is.

> Most PAR operations are IO-bound (judging by drive thrashing, not from
> actual benchmarks).

Not to be rude, but you're mistaken here.
strace it when there are many small files; it is not doing syscalls.
Disk IO and/or thrashing is not the issue for small files.

Maybe disk thrashing is a problem during normal par operation, but it is a
minor problem compared to the computation (for my goals).
[ As aside, my algorithm is also streaming; it reads the 'file' in sequence
three times, so disk thrashing should not be a problem. ]

> I don't really understand the use of allowing thousands of tiny parts.
> What's the intended end use?

Note that PAR cannot help you if the unit of failure is very small.
Even one missing piece of a 'part' makes that 'part' useless.

Florian already mentioned multicast, and that is my first application. 

Another situation is one where you have any one-way network link 
(some crazy firewalls [my work; arg!!]).

Future (?) wireless networks might have base station with a larger range
than the clients. Clients could still download (without ACKs) in this case.

Perhaps your ISP has packet loss that sometimes sits at 20% (my home; arg!).

If you know how TCP works, you will also know that it will nearly
stop sending because it thinks the network is congested, even though
the real problem is a faulty switch which drops 20% of packets seemingly
at random. Using my code over UDP completely removes this problem.

(However, this is dangerous because my code is also 'unfair' in the sense
that it will stop all TCP traffic b/c it will not care about packet loss 
due to conjestion while the TCP traffic will back off)

You might also use it to make a version of bittorrent where each packet is
independent of the others. This would help prevent 'dead' torrents where
there is no seed and all downloads stall b/c the known information overlaps.

Another case might be mobile agents where PDAs exchange parts of files they
are looking for whenever they run into other PDAs they can bargain with
(like bittorrent).

However, PDAs move when their owners move, so network sessions are
interuppted at random times, and one PDA may never see the other ever 
again.

This scheme would let a PDA broadcast a file to all nearby PDAs which
could make use of the information regardless of when they leave (mid
'part'?) or whether they have already pieces of the file.

Another situation I would like to apply my code to is to sensor networks
where there is a stream of measurements of some variable. My code can
not presently handle this correctly, but that is future work for me.

I am not a very imaginative person; I am sure there are many other
situations where this could be applied.

From another point of view, research doesn't *need* to be practical. ;)

If other people have ideas, I'd like to hear them.

-- 
Wesley W. Terpstra



Reply to: