[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: cp output format



Quoting Nicolas George (george@nsup.org):
> Le decadi 30 messidor, an CCXXIII, David Wright a écrit :
> > > And of course (unless the files are large (unlikely for .forward) and on the
> > > same mechanical drive), cmp file1 file2 is much simpler.
> > I may've missed something here. I can't think why computing the
> > md5/sha-2 digest would ever be better or simpler than cmp, even
> > if the files are large and/or on the same spindle).
> 
> You missed the end of the parenthesized text. Try this:
> 
> cmp /cdrom/300_megs_file_1 /cdrom/300_megs_file_2
> 
> ... and when you are done buying a replacement for your optical drive, you
> can tell me if cmp was really better than a hash.
> 
> The explanation is: If the files are large, then neither the application nor
> the kernel will read them at once. Therefore, with cmp, read will happen
> alternatively on each file until the end.

I see your point now. Fortunately I always put a .md5 file on CDs
which contains the digests of all the files. So I'll pass on trying it.

> If the file are not already present in the cache and are on the same
> mechanical drive, that means moving the read head hundreds of time. Even if
> it does not kill your drive, it will be awfully slow.
> 
> With hashes, unless you make the mistake of running the hashes in parallel
> thinking you will save time, the first file is read in full and then the
> second, and everything goes as fast as sequential reads.

I've use digests for pruning identical files from backups and they're
computed serially, fortunately. So by accident I hadn't run into the
problem you outline. But many thanks for elaborating.

Cheers,
David.


Reply to: