[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Unicode encoding as spam protection



On Sun, May 16, 2004 at 02:43:34PM +0200, Dale C. Scheetz wrote:
> These days my job is very mundane, as a Staff Assistant for the Division 
> of Cultural Affairs, Florida Department of State. We have a lot of 
> computer ignorant folks in the office, so there is lots of trouble with 
> spam.
> 
> As one of the solutions, it  has recently been discovered that encoding 
> email addresses on the web site in unicode currently defeates the bots 
> scrounging web pages looking for likely addresses to spam.
> 

what do you mean by "unicode encoding"? 
Most often used and most supported unicode encoding used on WWW is
UTF-8, which is just identical in ASCII range, so it won't 
buy you anything. Putting the pages in UTF-16 is better, but
not all browsers support it quite well

> As there are several virus programs that harvest addresses from the 
> address book, my question is: How can I unicode my address book, and 

don't put your address book on the internet!

> will Mozilla Mail still know how to read it? (Apparently unicode encoded 
> addresses on web sites show up just fine in any browser, including 
> Mozilla. I just wanted to know if the translation was at a fundamental 
> level so that it would work in the address book as well.
> 
> Anyone have any ideas? Is this idea worth implimenting transparently in 
> Mozilla? (Is it already there and I just don't know it? ;-)
> 

My ideas:
1) put the page in UTF-16. This might do what you want effectively. The
downside is that not all browsers support it quite well
2) put the page in use U+FEFF ZERO WIDTH NO-BREAK SPACE on strategic places throught
your page. This *should* display OK, but it has little support from
fonts. Alternatively, you can use U+200B ZERO WIDTH SPACE
This is more supported, but still not universaly
3) mangle random characters, replacing them by similarly looking but
different unicode codepoints. Like U+0430 CYRILLIC SMALL LETTER A
looks the same as U+0061 LATIN SMALL LETTER A
(this is a horrible idea from purity point of view, though :-))
It is rather well supported, most default fonts handle basic cyrillic
and greek well
4) all the above is based on changing internal representation of text,
while keeping the same visual appearance. To do it more efficiently, 
put all the addresses as <IMG> tags (or just the "@" in them)
5) simple: put space around each "@" in e-mails. They remain human readable,
but bots won't harvers them. If only there weren't users pasting such
addresses into their e-mail clients :-)
(see my .sig)
6) use IP literals - see my address in the headers
The advantage is that mail links will remain clickable, but most bots will 
choke on them. Disadvantage - the addresses are different from what
people consider "their" e-mail



-- 
 -----------------------------------------------------------
| Radovan Garabík http://melkor.dnp.fmph.uniba.sk/~garabik/ |
| __..--^^^--..__    garabik @ melkor.dnp.fmph.uniba.sk     |
 -----------------------------------------------------------
Antivirus alert: file .signature infected by signature virus.
Hi! I'm a signature virus! Copy me into your signature file to help me spread!



Reply to: