[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Filtering (WAS: Re: Live CD as a focal point for reviving Jr development)



On Monday 03 October 2005 22:01, Ben Armstrong wrote:
> On Mon, 2005-10-03 at 15:53 -0300, Marcela Tiznado wrote:
> > - Internet filter
> > Would be nice to have some panel where the parents can configure the
> > livecd to conect to internet or not. Have a dansguardian or something
> > to filter contents.
>
> Filtering has come up before.  Search the list archives.  I'm in favour
> of supporting parents who want to do it, but not actually providing the
> filtering in Debian Jr. or recommending one package over another.  
agreed completly, 
also automatic filtering basically doesn't work, anybody who thinks using 
automatic filtering will keep their kids from getting to pr0n sites (or 
whatever other kind of potentially objectional material you might define) 
is deluding themselve seriously 
( => this should definately be stated clearly along with the pointer to 
packages/services providing this kind of stuff):

There are basically 3 classes of filter programs:
- black/whitelisting: 
  -> works somewhat _if_ the list are built by humans, suffers from massive
   underblocking (no way to keep up with all new and changing sites)
- auto-classification based on some 'smart' technique, usually key-words
  -> shown to screw up big time time and again see e.g. [3] for
      documentation on commercial blocking programs blocking Amnesty,
      Competitiors, ACLU, gay right sites, ...
  -> really needs strong AI to work correctly, in other words don't expect 
      this to work anytime soon 
- a combination of the above: most commercial programs, most of the time 
this means do the auto-classification and then attempt to weed out 
overblocked sites (except that commercial companies seem to only do the 
weeding out in theory)

For some numbers on overblocking (false positives), and underblocking (not 
blocking what should be blocked) see e.g. [1] [2]. For anecdotes about how 
blocking software repeatedly screws in major ways see the site of watchdog 
groups like [3] and [4]
   
   NOTE: watch the definition of overblocking the pro-filter side tends to
             use "percentage of sites visited during testing that are
             wrongly blocked", whereas the contra-filter side tends to use
             "percentage of total sites blocked that's wrongly clasified".         
             accordingly the numbers differ from around 5% (for pro-side) to
             99% (contra-side, particularly eff study linked from [1])

[1] http://www.filtereality.net/archive/badblocks.html 
    gives pointers to several studies done, note that the one from the
    pro-filter side still states around 1 in every 10 blocked sites    
    shouldn't be blocked
[2] http://www.webjunction.org/do/DisplayContent?id=992 again summarizes
     several studies (from amongst others the DOJ), 
     scroll to the bottem for actual numbers: underblocking reported is
     about 10 % (i.e. 1 in 10sites that should be blocked isn't),   
     overblocking numbers found to depend heavily on configuration, and
     subject (e.g. 9% of safe sex sites were blocked on least restrictive
     setting, up to 50% on most restrictive setting tested)
[3] http://www.peacefire.org
[4] http://censorware.net/
-- 
Cheers, cobaco (aka Bart Cornelis)
  
1. Encrypted mail preferred (GPG KeyID: 0x86624ABB)
2. Plain-text mail recommended since I move html and double
    format mails to a low priority folder (they're mainly spam)

Attachment: pgpAWVA0IxoPN.pgp
Description: PGP signature


Reply to: