[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Let's shrink Packages.xz



Jeff Epler wrote:
First, I tried encoding the various digests as base64 or base93, rather
than hex.  In each case, the file grew in size; base93 was the worst.

Are you sure you performed this calculation correctly?

"ASCII hex" encodes 4 bits as 8 (or 7. but really 8.), as each ASCII character is a nibble of the digest; that's a 100% increase (factor of 2) over the bare digest (or a "raw mapping" of 8 bits of digest to an 8 bit character set).

base64 encodes 6 bits as 8; that should only be a 33.3% increase (factor of 1.333).

I've never heard of base93, but I found a reference that I think describes what you mean [0]. This should provide even better efficiency over base64, as should any binary-to-ascii mapping of higher radix. Perfect segue...

What are we looking for in an encoding? I'm guessing this needs to be printable, suitable for human consumption (or at least "copy/paste" / "consumption via text editor"), and "7-bit compat"?

Is this even up for debate? The community at large ("computer users"), Debian included, seems to have standardized on "message digests as ASCII hex"...

[0] http://kiwigis.blogspot.com/2013/09/base-93-integer-shortening-in-c.html

--
Nate


Reply to: