[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: lynx and google.com



I think my previous reply barfed and didn't go, apologies if this 
doubles.

On Thu, May 06, 2004 at 02:14:56PM -0400, David P James wrote:
> Maybe they caught on to your little trick and put an end to it? :) Or at 
> least tried to, since you seem to be spoofing the UA.

I'm sure they have anti-screen-scraping technologies, but Occam's Razor 
suggests they aren't responding to me in particular, maybe they're 
upgrading Apache?  Does lynx use HTTP/1.0?  and links and wget and moz 
use HTTP/1.1?

> >   -e 's/href=\//href=http:\/\/google.com\//g' \
>                        ^
> Does this line actually work? To me it looks like you're missing an 
> escape before the second '/' before the second 'href'.

Yes, it actually works.  The unescaped / are delimiters for the s///.
It's trashy write-only code -- but it works.

I wish Moz AdBlock extension would get aggressive about putting 
HTML-morphing technology in there to properly extract real content from 
pages filled with text ads.  Mine is a grotesque hack, shh, don't tell 
Google, they don't want to be Evil! 



Reply to: