Re: lists.debian.org vs google groups
Pascal Hakim wrote:
On Thu, Apr 06, 2006 at 09:51:30AM +0100, Doofus wrote:
Can you quote:
I can't do the last twelve months, as we don't keep our data that far
back, and some of these numbers have to be counted invidually, but
here are the numbers for March.
Due to our multi step filtering process I can't even get numbers for
the whole of March, but we can make some assumptions.
1. the total number of posts from all sources received by the d-u list
servers in the last twelve months,
The first step is dropping things at the MTA stage. Those logs don't go
back that far as they get pretty big. I've picked a full 7 days at
random that to be used as a sample and we get: 5891.
Since we're playing with round figures anyway, let's say that works out
at: 4.3 * 5891 = ~25300
All the rest of the numbers are for March:
Other filters: 333
-> subtotal: 12380
Total blocked spam: ~37700
Actual messages pushed through the list: 3404
Total trapped spam > 91.5%
2. the number of posts received by non list members in the same period,
This can mean two things. If you want the numbers above but for
non-subscribers only, we can't do that for a large chunk of them, and
it would take too long for the rest.
If you want to know simply how many posts were made by non-subscribers
that then made it to the list and were posted, it's 862.
As Hendrik pointed out, I did of course mean *from* non list members.
I see no ambiguity in "how many posts were received from non list
members [in sample period]?", and can't see how you could reach your
second interpretation above. It's an interesting point though, and 25%
of legitimate posts originating from non subscribers certainly
strengthens the case for an open list.
3. the number posts actually published on d-u after all filtering in the
4. the number spam (or non-spam) posts actually published on d-u in the
I went through the archive for March, and pulled out the numbers. I
found 25 spam messages, which leaves us with 3379 valid messages.
Not a lot really, I concede.
The answers to these should go some way to highlight the scale of the
problem, and also how much benefit is gained by allowing everyone in the
world aim their crap at all of our mailboxes. I'll be surprised if a
statistic is available for (4), but would appreciate the answers if
Even if we assume that I fell asleep on the page down key while counting
4., and guess that I missed half, we're still talking about blocking
over 800 valid messages.
25/37700 works out to be 0.066% of spam not being blocked. It's still
annoying of course, as the metric to use is the number of spam messages
that make it through rather than the percentage that make it through.
SNR and all that.
No, we need all the numbers. Only percentages describe the efficiency of
the filtering. Your figures indicate some pretty impressive filtering
I could have asked another question: How much of the spam that gets
through originates from non list members? I'll have a guess - all of it.
What exactly *is* the argument for allowing non subscribers to post? All
answers other than "debian=blind freedom" appreciated.
And thanks for your effforts and answer Pasc.