[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: lists.debian.org vs google groups



Hi,

On Thu, Apr 06, 2006 at 09:51:30AM +0100, Doofus wrote:
> 
> Can you quote:

I can't do the last twelve months, as we don't keep our data that far
back, and some of these numbers have to be counted invidually, but
here are the numbers for March.

Due to our multi step filtering process I can't even get numbers for
the whole of March, but we can make some assumptions.

> 1. the total number of posts from all sources received by the d-u list 
> servers in the last twelve months,

The first step is dropping things at the MTA stage. Those logs don't go
back that far as they get pretty big. I've picked a full 7 days at
random that to be used as a sample and we get: 5891.

Since we're playing with round figures anyway, let's say that works out
at: 4.3 * 5891 = ~25300

All the rest of the numbers are for March:
CrossAssassin: 7375
SpamAssassin: 4672
Other filters: 333
		-> subtotal: 12380
Total blocked spam: ~37700


Actual messages pushed through the list: 3404

> 
> 2. the number of posts received by non list members in the same period,

This can mean two things. If you want the numbers above but for
non-subscribers only, we can't do that for a large chunk of them, and
it would take too long for the rest.

If you want to know simply how many posts were made by non-subscribers
that then made it to the list and were posted, it's 862.

> 
> 3. the number posts actually published on d-u after all filtering in the 
> same period

3404

> 
> and
> 
> 4. the number spam (or non-spam) posts actually published on d-u in the 
> same period?

I went through the archive for March, and pulled out the numbers. I
found 25 spam messages[1], which leaves us with 3379 valid messages.

> The answers to these should go some way to highlight the scale of the 
> problem, and also how much benefit is gained by allowing everyone in the 
> world aim their crap at all of our mailboxes. I'll be surprised if a 
> statistic is available for (4), but would appreciate the answers if 
> they're available.

Even if we assume that I fell asleep on the page down key while counting
4., and guess that I missed half, we're still talking about blocking
over 800 valid messages.

25/37700 works out to be 0.066% of spam not being blocked. It's still
annoying of course, as the metric to use is the number of spam messages
that make it through rather than the percentage that make it through.
SNR and all that.

Cheers,

Pasc

-- 
Pascal Hakim                                          0403 411 672
Do Not Bend



Reply to: