[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#568140: RFA: ent -- pseudorandom number sequence test program



Hi,

I use ent about once a month to get a rough estimate on entropy. This is
useful for detecting compressed content and also encryption keys :-) I
do not need speed.  For me the value of ent package is that I can
quickly get a rough entropy estimate without having to write my own
program that uses some statistics library.

If you are planning to drop the current package, how about replacing it
with a wrapper that just uses python-stats for the statistics? This way
there would be less code to maintain but users would still get a simple
way to do entropy calculations from command line.

Just to check how difficult this would be I hacked together a partial
replacement:

#!/usr/bin/python
import stats, sys, math

freq = [0] * 256
total = 0
data = open(sys.argv[1]).read()
for i in data:
    freq[ord(i)] += 1
    total += 1

prob = [None] * 256
for i in xrange(256):
    if freq[i] > 0:
        prob[i] = float(freq[i]) / total

ent = 0
for i in xrange(256):
    if prob[i]:
        ent += prob[i] * math.log(1 / prob[i]) / math.log(2)

print("Entropy = %f bits per byte." % ent)
print("")
print("Optimum compression would reduce the size")
print("of this %s byte file by %s percent." % (len(data), 100 * (8-ent)
/ 8))
print("")
print("Chi square distribution for %ld samples is %1.2f, and randomly" %
      (len(data), stats.lchisquare(freq)[0]))


Testcase:

$ seq 1 20 > testcase
$ ent testcase
Entropy = 2.727072 bits per byte.

Optimum compression would reduce the size
of this 51 byte file by 65 percent.

Chi square distribution for 51 samples is 2885.47, and randomly
would exceed this value 0.01 percent of the times.

Arithmetic mean value of data bytes is 35.0980 (127.5 = random).
Monte Carlo value for Pi is 4.000000000 (error 27.32 percent).
Serial correlation coefficient is -0.668931 (totally uncorrelated = 0.0).

vs.

$ ./ent.py testcase
Entropy = 2.727072 bits per byte.

Optimum compression would reduce the size
of this 51 byte file by 65.9115948451 percent.

Chi square distribution for 51 samples is 2885.47, and randomly





Reply to: