[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

diffing two large compressed (.bz2 or .lzma) files?

Is there a utility that can efficiently output the differences between
two large compressed files? Note: one can assume that the compressed
files just differ in a few places, so that the utility MUST NOT take
more than a few megabytes (whether in RAM, swap or disk).

bzdiff (from the bzip2 package) first decompresses one of the file to
a temporary file, thus is not a solution (it filled up my partition!).

I've also tried process substitution (with zsh, but this is also
supported by bash):

  diff <(bunzip2 -c file1.bz2) <(bunzip2 -c file2.bz2)


  diff --speed-large-files <(bunzip2 -c file1.bz2) <(bunzip2 -c file2.bz2)

but in both cases, diff takes too much swap (I think the problem with
process substitution is that diff cannot control how the files are
decompressed, but perhaps diff doesn't cope well with this either).

I've taken the example of .bz2, but I may switch to lzma. So, I'm
interested in possibilities for both.

Vincent Lefèvre <vincent@vinc17.org> - Web: <http://www.vinc17.org/>
100% accessible validated (X)HTML - Blog: <http://www.vinc17.org/blog/>
Work: CR INRIA - computer arithmetic / Arenaire project (LIP, ENS-Lyon)

Reply to: