[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: On the verge of suicide:tar & gunzip problems



v.demartino2@virgilio.it writes:
>
> vic:/# tar -xvzf /mnt/backup-compaq/home.tar.gz
> home/
> home/victor/
> home/victor/.R/
> home/victor/.R/help.db
> tar: Skipping to next header
> tar: Archive contains obsolescent base-64 headers
>
> vic:/# gunzip /mnt/backup-compaq/home.tar.gz
> gunzip: /mnt/backup-compaq/home.tar.gz: invalid compressed data--crc error
> gunzip: /mnt/backup-compaq/home.tar.gz: invalid compressed data--length error
> vic:/#

Vittorio:

Don't kill yourself quite yet.  These are the usual symptoms of
corruption in the middle of a gzip-compressed archive.  If it was only
a couple of corrupt blocks, you have a pretty good chance of
recovering most of your data.

"zcat" will happily plow through the corrupt compressed file and
generate uncompressed text as best it can.  Because of the nature of
the compression algorithm, a single bad compressed input block will
result in a long string of corrupt uncompressed output blocks.

Eventually, the decompression state will probably resynchronize
(though this is not 100% guaranteed, it'll probably happen within a
few hundred kilobytes).  Assuming this happens, "zcat" will start
generating good output again.

The trouble is, the output won't generally be properly aligned, so
"tar" (which started skipping 512-byte blocks at the first bad header)
won't find another header aligned at the start of a 512-byte block,
and will gobble up the whole file without finding anything else to
untar.

So, how do you realign?

Well, assuming most of the files in your tarfile are from the
"home/vic/" directory, every header block will start with the string
"home/vic/".  You can run the following one-liner:

zcat /mnt/backup-compaq/home.tar.gz | 
  perl -ne 'm,home/vic/, && do { ++$count[($l + length($`)) % 512] }; 
  $l += length($_); END { for (0..511) { printf "%3d %5d\n", $_, 
  $count[$_] if $count[$_] } }' | sort -nr +1 | head -20

(I've formatted it onto multiple lines, but it should be typed in as a
single line.)  This simply counts the number of times the string
"home/vic/" appears in the file at each possible offset within a
512-byte block.  It outputs, at most, the 20 most frequent offsets.
For a small, corrupt tarfile, the output might look like:

        165   291
          0    27
         18     2
        398     1

The left column list the offsets; the right column gives the count of
each offset.

This indicates that the string appeared 291 times at offset 165, 27
times at offset 0, and a handful of times at offsets 18 and 398.
These last two are false positives (occurrences of "home/vic/" that
weren't from a tar header).  The 27 occurrences of 0 are the headers
before the corruption.  After the corruption, when the decompressed
stream recovered, it was offset by 165 bytes, and those 291 other
headers are recovered files at the wrong offset.

When you do this (since your tarfile is so gigantic), you'll probably
have a good number of false positives.  However, if there was
corruption in only one place, there should be one non-zero offset that
is overwhelmingly more frequent than the rest.

Anyway, to recover files at a particular offset, for example offset
165, use the following:

        zcat /mnt/backup-compaq/home.tar.gz | tail -c +166 | tar tvf -

Note that the number in the "tail" command should be one more than the
offset output by the one-liner.

Good luck!

-- 
Kevin <buhr@telus.net>



Reply to: