[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: results: debian-user's favourite FLOSS (2009)

On Tue, 17 Nov 2009 14:01:53 -0500
Tim Tebbit <ttebbit@gmail.com> wrote:

> Dotan Cohen wrote:
> > 
> > My guess would be the popularity of Ubuntu. Much of the pie went
> > there. As Ubuntu is Debian-derived, how about doing the poll there as
> > well?
> > 
> Out of curiosity I extracted unique email address from ubuntu-users.mbox
> available from their archive site. The archive starts 15 Sep 2004, and I
> found 10,809 unique address give or take a few my patterns missed.
> I'm sure their current subscription is not that high, and I was not able
> to find a mbox for debian-user ( a good thing ) to see what debian had
> over the same time period for comparison.

As you say, a good thing ;)  Here's a harvester to go through an mbox
file and print out all the 'froms' found (requires Mail::MboxParser -
install libmail-mboxparser-perl):

#! /usr/bin/perl -w

# usage: 'harvester.pl mbox'
# prints all values of 'from' lines in 'mbox'
# to do useful things with this, you'll probably want to pipe the output
# through sort, and sometimes uniq, e.g.:
# 'harvester.pl mbox | sort | uniq | wc -l' - count the number of unique
# addresses found (not entirely accurate, since there can be multiple
# equivalent variations of the same address

use Mail::MboxParser;

my $parseropts = {
	enable_cache    => 1,
	enable_grep     => 1,
	cache_file_name => '/tmp/harvester-cache-file',

my $mb = Mail::MboxParser->new(shift, decode => 'ALL', parseropts => $parseropts);
while (my $msg = $mb->next_message) {print $msg->header->{from}, "\n"}

foffl.sourceforge.net - Feeds OFFLine, an offline RSS/Atom aggregator
mailmin.sourceforge.net - remote access via secure (OpenPGP) email
ssuds.sourceforge.net - A Simple Sudoku Solver and Generator

Reply to: