[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: dvd md5sum connundrum



Gene Heskett wrote:

On Sunday 20 November 2005 12:35, scdbackup@gmx.net wrote:
Hi,

Using that, I read it with dd if=/dev/cdrom bds=2048
count=595536|md5sum which gave the wrong answer, so I tried it with
595535 and also got the wrong answer.

Then I fired up kcalc ... READ DVD STRUCTURE ... Legacy lead-out ...
... 26624 bytes, or 13 2048 byte blocks, is there a std for this
'padding' that we could subtract, or are we doomed to use the iso's
actual size to determine the amount of data to feed to md5sum to make
it work in this scenario?  How about subtracting a
(MOD(64k)/size)/2048 on the mediainfo returned size?
Your goal is quite demanding.

You compute the checksum of a certain number of bytes which you then
convert into a storage representation which is known to be fuzzy with
the exact number of stored bytes. Then you begin to riddle how much of
the fuzzy end belongs to your checksummed data and where the trailing
garbage possibly might begin.

I'm open for ideas and insights.  This is a problem that the broken
dvd filesystem gives us, and it needs to be fixed in a joe six-pack
can use it manner.
One has to distinguish between filesystem and DVD.

Currently you are exploring the DVD aspect which is responsible
for storing a byte array on media. That byte array does not
necessarily have to be a filesystem.
One has to be aware that the writing process is free to append
readable data to the end of the byte array. It is also possible
that old data or even virgin blocks are readable after the end
of the array.

Your original checksum was made from a filesystem image.
Probably you should rather query the filesystem image on
DVD in order to learn about the original image file's size.

But that querying will make your method prone to small
changes in the behavior of the image formatter or the writer.
(Do mkisofs -pad bytes count as part of the filesystem ?
They are part of the resulting file, at least.)


Since you have to memorize the original MD5 for comparison
with the DVD's MD5 anyway, why not just memorize the size
of the original file too ?
The pair (size,MD5) is an unambigous fingerprint for
media which deliver the original byte array plus some
trailing garbage.
The comparison relies entirely on information which is
easy to determine at the time of writing the media. It does
not rely on inner details of the particular byte array's
semantics. This method is also independent of the writer
software (growisofs, cdrecord-ProDVD, dvdrecord, whatever).

Another approach would be to add some recognizable bytes as
an end mark to the payload data when writing to media.

And that would no doubt require growisofs to be modified in some way to
achive this.

I myself consider a combination of both approches (size +
end mark) to be very useful.

Well, considering that I did get the correct md5sum return when md5sum's
input was restricted to the exact size of the .iso image on the hard
drive, what we need is the ability to get, from the dvd, the size of
the image.  That doesn't appear to be available, or is this something
that is there, but just not read by the *info utils?  I have problems
with depending on the availability of the original .iso image for this.

Why don't you want to use the same data out of the ISO image on the DVD. If the header is so munged you can't read that, chances are the whole thing's a bust. There's isoinfo and some KDE thing I don't use and can't remember.

Bear in mind also that a 'cmp' between the .iso and the burnt disk
returns no differences up until it hits EOF on the src .iso. This could be
used as an alternative check method as it seems to assure that they are
indeed identical up to that point.  My main concern there is that cmp
dies on the first error rather than listing them all until it hits the
EOF on one or the other input streams.  The fact that there is an error
at all is the important part, but it would be nice to know if its a
repeatable pattern such as a scratched disk might output.


--
bill davidsen <davidsen@tmr.com>
 CTO TMR Associates, Inc
 Doing interesting things with small computers since 1979



Reply to: