[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#1030207: ITP: libstatistics-topk-perl -- Implementation of the top-k streaming algorithm



Package: wnpp
Owner: Mason James <mtj@kohaaloha.com>
Severity: wishlist
X-Debbugs-CC: debian-devel@lists.debian.org, debian-perl@lists.debian.org

* Package name    : libstatistics-topk-perl
  Version         : 0.02
  Upstream Author : gray <gray@cpan.org>
* URL             : https://metacpan.org/release/Statistics-TopK
* License         : Artistic or GPL-1+
  Programming Lang: Perl
  Description     : Implementation of the top-k streaming algorithm

The Statistics::TopK module implements the top-k streaming algorithm, also
know as the "heavy hitters" algorithm. It is designed to process data streams
and probabilistally calculate the k most frequent items while using limited
memory.

A typical example would be to determine the top 10 IP addresses listed in an
access log. A simple solution would be to hash each IP address to a counter
and then sort the resulting hash by the counter size. But the hash could
theoretically require over 4 billion keys.

The top-k algorithm only requires storage space proportional to the number of
items of interest. It accomplishes this by sacrificing precision, as it is
only a probabilistic counter.

The package will be maintained under the umbrella of the Debian Perl Group.

--
Generated with the help of dpt-gen-itp(1) from pkg-perl-tools.


Reply to: