[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: compare two directory trees



On Tue, 14 Dec 2010 03:02:11 -0600, Boyd Stephen Smith Jr. wrote:

>>I burn DVD/CDs from ISO files.  In order to verify the burning is
>>correct, 

There are two ways to verify if the CD/DVD burning is correct. 

- verify it as a whole
- verify each individual files

If you are satisfied with the first level, check out isomd5sum
which is what RedHat used to verify it released CD (implantiso).

 isomd5sum is a set of utilities for implanting a MD5 checksum in an
 ISO (or any block device), then verifying the checksum later.  isomd5sum
 is not simply an MD5 of the entire ISO; it checksums the data inside a
 standard ISO9660 image and write block checksum information to an ISO9660
 header, that will carry over to burning the CD.

Else,

>> I wrote a script working like this:
>>1. mount the DVD and ISO files onto two mount points 2. calculate every
>>file's md5sum in each directory, and save and sort them in two separate
>>files
>>3. compare the above two files.
>>
>>The above method does work, but too time consuming because of the md5sum
>>calculating.  Do you have any suggestion to improve the efficiency?

This is the only option that you have in order to verify each individual 
files, ie, check file by file.

> Don't waste CPU time on MD5.  Don't waste CPU time performing filesystem
> operations.  Compare the two images byte-by-byte using something like
> diff.

Well, IMHO, you shouldn't waste CPU time on MD5 calculation, but byte-by-
byte comparison is not a good choice either, because, CD/DVD has the 
tendency of deteriorate over the time. Even it is ok freshly burned, it 
does not mean it always be so, because of the wear and tear. Moreover, by 
the time you want to compare again, you may find that your source is gone!

My solution: I use CRC32 and put the checksum file on the disk. I coined 
such solution when I was facing another weird situation -- a CD burned on 
a particular burner can not be reliably read from other burners.

The checksum program that I wrote is able to do both CRC32 and MD5, but 
static tells me that CRC32 is more than enough for such case: 

The probability of a corrupted string going undetected is 1/(2^n). I.e., 
a 32-bit CRC has a probability of 1/(2^32), which is about 2.3E-10 (less 
than one in a billion).

The new fast 32-bit CRC algorithm is one magnitude faster than current HD 
I/O. I.e., using 32-bit CRC, the only bottleneck you have is your HD 
speed.

FYI, I'm been using my program for nearly 10 years, but just recently I 
have time to put it up on the internet (But only get time to get it 
started):

http://savannah.nongnu.org/p/checksum

HTH

-- 
Tong (remove underscore(s) to reply)
  http://xpt.sourceforge.net/techdocs/
  http://xpt.sourceforge.net/tools/


Reply to: