[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: md5sum lots of files



Grok Mogger wrote:
I have about 36 GB of files on a hard disk that I've transfered to another disk. I'd like to cksum or md5sum the files just to make sure that they were all copied well. I can't seem to find a way to recurse through the directories and do this to a lot of files. I've looked around a lot, and finding nothing I'm about to start writing my own script, but I thought I'd ask here first. It just seems like something that there would be a way to do already, and I'm just missing it.

Thanks,
- GM

My approach may be overkill for your purposes, but it handles
pathological filename characters including single quotes, leading/trailing
spaces, and backslashes, which I occasionally find in system backups.
Since using these scripts, I've gone several years now without any filename
problems in my backups.

In addition, unlike some of the more common approaches mentioned here,
I wanted my scripts to be compatible with Debian archive md5sum
indices files, account for all file types instead of just regular files,
and facilitate error logging.

They are crudely written and slow, and not the best example of bash
scripting, particularly the way I use scripts as subroutines.  I appreciate
any suggestions for improvement.

(Somewhere in the debian-user archives you might find early versions of my
entire set of archiving scripts, if you are interested.)

This script creates the gzipped md5sum list:
rm $1/md5sums.gz
DATE=`date +%j%H%M%S`
pushd $1; find -exec /use/local/bin/make-md5sum {} \; | cat >/tmp/md5sums.$DATE;popd
gzip -c /tmp/md5sums.$DATE >$1/$2/md5sums.gz

This script uses the md5sum list to check the backup:
zcat $1/md5sums.gz | sed 's/  /&"/' | sed 's/$/"&/' |  sed 's/\\/\\\\/g' |  /usr/local/bin/md5chk $1

Here are the "subroutine" scripts, make-md5sum and md5chk.

/usr/local/bin/make-md5sum:
if [ -h "$1" ]
then
	echo "00000000000000000000000000000000  $1"
else
if [ -f "$1" ]
then
	md5sum "$1"
else
if [ -e "$1" ]
then
	echo "00000000000000000000000000000001  $1"
else
	echo ----------  path not found ----------- >&2
	echo "---->>>>>  $1  <<<<<-----" >&2
	echo --------- possible name problem ------------>&2
fi
fi
fi

/usr/local/bin/md5chk:
cd $1
while read md5 filep
do
	filepath=`echo "$filep" | sed 's/"//g'`

	if [ -h "$filepath" ]
	then
		testmd5="00000000000000000000000000000000"
	else
	if [ -f "$filepath" ]
	then
                testmd5=`md5sum "$filepath" | (read md5 filepath; echo $md5;)`
	else
	if [ -e "$filepath" ]
	then
		testmd5="00000000000000000000000000000001"
	else
		echo "$filepath" not found | tee >&2
		echo
		continue
	fi
	fi
	fi

	if [ "$testmd5" != "$md5" ]
	then
		echo "$filepath md5sums don't match" | tee >&2
		echo "orig md5sum= $md5" | tee >&2
		echo "test md5sum= $testmd5" | tee >&2
		echo
	fi
done



Reply to: