[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

RE: Determining the usefulness of compression




> -----Original Message-----
> From: Vineet Kumar [mailto:debian-user@virtual.doorstop.net]
> Sent: Wednesday, December 04, 2002 8:44 PM
> To: debian-user@lists.debian.org
> Subject: Re: Determining the usefulness of compression
>
>
> * Charlie Reiman (creiman@kefta.com) [021204 18:26]:
> > > -----Original Message-----
> > > From: Rob VanFleet [mailto:rvf@linux.wku.edu]
> > > Sent: Wednesday, December 04, 2002 5:40 PM
> > >
> > > I am writing a script that will compress certain files passed to it
> > > (well, that's a part of the script) and I was wondering if there was a
> > > simple way to determine if a file is worth compressing or not.  I know
> > > that with some very small files, compression actually
> increases the file
> > > size.  Should I just look at the file size and only compress if over a
> > > certain size or is there a more efficient method?
>
> > Not really. The best (indeed, only 100% accurate) way to
> determine if a file
> > is compressible is to compress it. That doesn't mean you can't
> use some good
> > heuristics. Good ones are:
> >
> > filename suffixes (never bother with gz, tgz, bz2, zip, jpg,
> jpeg, gif, z,
> > Z, mpg, mpeg, avi, wav, mov....)
>
> Huh? some AVIs and all WAVs are uncompressed, and will benefit
> enormously from compression.  The theory here is correct, though:
> don't try to compress already-compressed data; it won't work.

I said "heuristic". Heuristic is computer science jargon for "Doesn't
actually work." Besides, even uncompressed WAVs and AVIs are often not worth
compressing with standard LZW derived compressors. Go try it. You'll
probably find 20% reduction for most 16 bit wav files. Compare this with 50%
reduction for most executables and 80% reduction for ASCII or HTML files.
It's better to attack multimedia files with specialized compressors.

I do agree with tarring first. You can get much better compression by
tarring everything up first then compressing the tar file. This lets the
compressor exploit patterns between files, not just within individual files,
so the compression ratio increases quite a bit.






Reply to: