Bug#828787: ITP: libdisorder -- library for entropy measurement of byte streams and other data
Hi Guus,
On Mon, Jun 27, 2016 at 10:56:14PM +0200, Guus Sliepen wrote:
>
> I hope you will fix this description. I'd only keep the last paragraph,
Done.
> and then also explain what algorithm it actually uses to measure the
> entropy (Shannon's source coding theorem). This theorem is actually only
> usable in the context of an input of "independent and identically
> distributed random variables", it does not apply to every kind of input.
> In particular, it only looks at the histogram of byte values; if you
> feed it a file with totally predictable increasing byte values 0, 1, 2,
> etc., it will report an entropy of 8. Many compression algorithms,
> especially those for sound and images, look at differences between
> consecutive values or have other means to detect such predictable
> sequences. So make it clear that it just implements Shannon's H function
> and that it also only works on bytes.
I'd be happy if you would commit a fix to Git (its writable to any DD)
since you obviously know more about this than me.
> I also want to point out that this library is not thread-safe, something
> which could easily be fixed.
A patch would be reall welcome.
> It also gives the wrong answer when you
> have an input with more than 2^31-1 of the same bytes in the input, even
> though it pretends to handle inputs up to 2^63 in length.
I think this information should be in README.Debian. What do you think?
> > Remark: The code of libdisorder appeared in two other targets of Debian
> > Med and to avoid code duplication this library is packaged separately.
>
> Although normally I would applaud deduplication, I personally think this
> shouldn't get its own package. It looks like one of those things you'd
> find npm.
I think I'll stick to this separate library approach.
Thanks a lot for your comments
Andreas.
--
http://fam-tille.de
Reply to: