Re: checking integrity of already written CD/DVD more info
On 2009-03-31_08:18:16, Paul E Condon wrote:
> On 2009-03-31_09:53:51, Matus UHLAR - fantomas wrote:
> > > >On Sun,29.Mar.09, 20:28:44, Angelin Lalev wrote:
> > > >> Is there a way to check a written DVD against the checksum of the iso
> > > >> image written on it?
> > > In <20090329202842.GA3540@think.homelan>, Andrei Popescu wrote:
> > > >$ md5sum /dev/dvd
> > > >
> > > >This should result in *exactly* the same checksum as the iso
> > On 29.03.09 16:27, Boyd Stephen Smith Jr. wrote:
> > > Not in my experience. Both DVDs and CDs have a physical sector size. If the
> > > image is not a multiple of that sector size, the md5sum of the block device
> > > and the image will differ, because of the extra bits in the last physical
> > > sector.
> > afaik, if the same image is written to multiple CDs/DVDs, they all should
> > have the same md5sum, independently on its size. That is the one md5sum
> > shjould report. The same for sha1sum.
> Are you saying that two files that have different lengths (size)
> should have the same md5 or sha1? If you mean something else by
> size, ignore this post, but ...
> The design goal of both md5 and sha1 is to provide, for any file, a
> message digest that is different from that of any other file that is
> different from the first file in any way. If the file that is read
> from CD/DVD device is longer, or shorter, than the iso that was used
> to burn the CD/DVD then the two files are different in a way that is
> significant to the message digest idea. For two files to have the same
> message digest, they must be bit for bit identical. That means, at a
> very minimum, they must have the same length. The motivation for the
> invention of sha1 was that there was growing evidence that md5 was
> failing to meet the design goal of "identical message digest only if
> the files are identical".
> IOW, a message digest algorithm must NOT ignore trailing zero bytes,
> or trailing "garbage" bytes that have no effect on the meaning of the
> file in its intended use.
> Trailing zero bytes are easy to truncate. If the truncate file has a
> matching message digest, one can be reasonably confident that the file
> with the trailing zeros will function properly. For trailing "garbage"
> bytes it is difficult to assert that those lost bytes at the end
> really are garbage that may safely be ignored.
> I have two CD/DVD devices one consistently reports longer files than
> the other on reading the same CD/DVD. Luckily, for me, for the one
> reporting the longer file, its reading is always longer than the
> length of the iso that I used to burn. I truncate the longer file to
> match the iso and a always get a matching message digest. If the
> message digest of the leading part of the file matches the whole of
> the iso, and if the iso is a well constructed image of what should be
> on a CD/DVD, then a CD/DVD reader should never look at those extra
> bytes that dd reports at the end.
> A second comment: In my experience, the iso files that I download from
> Debian always have lengths that are integral multiples of 1024 bytes.
> I think there is already some padding going on in the creation of these
> files, so partial sectors in the iso is probably not an explanation
> for whatever difficulties one may be having in verifying a CD/DVD.
> (On doing a little quick research, I think the sector size on CD/DVD
> may be 2048 bytes. I don't make a claim for integral multiple of 2048
> because that is not what I actually tested. I don't remember whether
> the integer was odd or even, just that there was no remainder.)
> My problems with the beautiful one or two line checking script are
> indicative of a little extra complexity here. Manufacturers don't,
> apparently, guarantee reliable, accurate end-of-file checking. If
> you are unlucky and have hardware with unreliable EOF sensing, you
> need to take extra measures in verifying the accuracy of a burn.
I did an experiment with the drive that always truncated the dd read
of the CD. The iso is lenny business card. This iso is 18133 blocks
of 2048 bytes each. The read back using dd on the short read drive
is 18104 2kblocks long, which is 29 block short of a full load.
Using /dev/zero, dd, and cat, I create another copy of lenny business
that has 40 blocks of zeros appended at the end. Then I burn a CD from
this iso file. Then I read back from this newly burnt CD. The read-back
is 18168 blocks, which is quite a bit longer than the read-back of that
of the original iso. It is 64 blocks longer which doesn't seem to add
up. But notice that it is longer than the original iso, so I truncate
this longer file to the same length as the original, and compute the
md5 of this augmented-then-truncated file. This file has the same md5
as the original iso from Debian, so I think it is rational to believe
that the CD does have all the bits from the original accurately
reproduced, and an unknown amount of junk at the end where the isofs
software will never look. All this work was done using 2048byte block
size in dd. The original iso from Debian has 18133 blocks of this
size. This being a odd number, it is unlikely that the block-size on
the CD is actually 4096 or larger.
I think this is good news for people who are unlucky enough to have
only disk drives that give too short a read-back from a CD.
Paul E Condon