[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: bug in derivatives blog-polling scripts?



Paul Wise left as an exercise for the reader:
> Thanks for the report Nick!
> Neither of thse IP addresses are the Debian machines that run any part
> of the derivatives census code:
> So it looks like somehow those id=..._show are getting leaked to a
> href somehow. At least in Iceweasel this doesn't happen when clicking
> on the +- buttons so I'm not sure what is going on. Do you know which
> User-Agent these weird clients are using?

Thanks for looking into this, Paul!

Indeed, I think the problem lies in a crawler that's hitting the Derivatives
page, and we can close out consideration of this being a DebDeriv website
issue. It appears to be crawler-inspired, something called "Heritrix".
Here's logs if you're interested, though as said, I think this is beyond our
collective purview.

[li170-29](0) $ sudo grep -r _show /var/log/apache2/*log
/var/log/apache2/sprezzatech.error.log:[Tue Jun 05 13:55:26 2012] [error] [client 66.249.71.33] File does not exist: /opt/sprezzatura/blog/0008-after-very-careful-consideration-ive-decided-your-support-sucks.html_show
/var/log/apache2/sprezzatech.error.log:[Fri Jun 08 11:26:10 2012] [error] [client 66.249.71.33] File does not exist: /opt/sprezzatura/blog/0009-a-disquisition-into-the-sadly-slovenly-takeup-of-10gbase-t.html_show
/var/log/apache2/sprezzatech.error.log:[Sat Jun 09 08:23:44 2012] [error] [client 157.181.181.71] File does not exist: /opt/sprezzatura/blog/0009-a-disquisition-into-the-sadly-slovenly-takeup-of-10gbase-t.html_show, referer: http://planet.debian.org/deriv/
/var/log/apache2/sprezzatech.log:66.249.71.33 - - [05/Jun/2012:13:55:26 -0500] "GET /blog/0008-after-very-careful-consideration-ive-decided-your-support-sucks.html_show HTTP/1.1" 404 1832 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
/var/log/apache2/sprezzatech.log:66.249.71.33 - - [08/Jun/2012:11:26:10 -0500] "GET /blog/0009-a-disquisition-into-the-sadly-slovenly-takeup-of-10gbase-t.html_show HTTP/1.1" 404 1838 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
/var/log/apache2/sprezzatech.log:157.181.181.71 - - [09/Jun/2012:08:23:44 -0500] "GET /blog/0009-a-disquisition-into-the-sadly-slovenly-takeup-of-10gbase-t.html_show HTTP/1.0" 404 4033 "http://planet.debian.org/deriv/"; "Mozilla/5.0 (compatible; heritrix/1.14.4 +http://susanszky.tatk.elte.hu)"
[li170-29](0) $ grep -r 157.181.181.71 /var/log/apache2/*log
/var/log/apache2/sprezzatech.error.log:[Sat Jun 09 08:23:40 2012] [error] [client 157.181.181.71] File does not exist: /opt/sprezzatura/blog/0009-a-disquisition-into-the-sadly-slovenly-takeup-of-10gbase-t.html_hide, referer: http://planet.debian.org/deriv/
/var/log/apache2/sprezzatech.error.log:[Sat Jun 09 08:23:44 2012] [error] [client 157.181.181.71] File does not exist: /opt/sprezzatura/blog/0009-a-disquisition-into-the-sadly-slovenly-takeup-of-10gbase-t.html_show, referer: http://planet.debian.org/deriv/
/var/log/apache2/sprezzatech.error.log:[Sat Jun 09 08:23:49 2012] [error] [client 157.181.181.71] File does not exist: /opt/sprezzatura/blog/0008-after-very-careful-consideration-ive-decided-your-support-sucks.html_hide, referer: http://planet.debian.org/deriv/
/var/log/apache2/sprezzatech.log:157.181.181.71 - - [09/Jun/2012:08:23:37 -0500] "GET /robots.txt HTTP/1.0" 200 341 "-" "Mozilla/5.0 (compatible; heritrix/1.14.4 +http://susanszky.tatk.elte.hu)"
/var/log/apache2/sprezzatech.log:157.181.181.71 - - [09/Jun/2012:08:23:40 -0500] "GET /blog/0009-a-disquisition-into-the-sadly-slovenly-takeup-of-10gbase-t.html_hide HTTP/1.0" 404 4033 "http://planet.debian.org/deriv/"; "Mozilla/5.0 (compatible; heritrix/1.14.4 +http://susanszky.tatk.elte.hu)"
/var/log/apache2/sprezzatech.log:157.181.181.71 - - [09/Jun/2012:08:23:42 -0500] "GET /blog/0009-a-disquisition-into-the-sadly-slovenly-takeup-of-10gbase-t.html HTTP/1.0" 200 22831 "http://planet.debian.org/deriv/"; "Mozilla/5.0 (compatible; heritrix/1.14.4 +http://susanszky.tatk.elte.hu)"
/var/log/apache2/sprezzatech.log:157.181.181.71 - - [09/Jun/2012:08:23:44 -0500] "GET /blog/0009-a-disquisition-into-the-sadly-slovenly-takeup-of-10gbase-t.html_show HTTP/1.0" 404 4033 "http://planet.debian.org/deriv/"; "Mozilla/5.0 (compatible; heritrix/1.14.4 +http://susanszky.tatk.elte.hu)"
/var/log/apache2/sprezzatech.log:157.181.181.71 - - [09/Jun/2012:08:23:49 -0500] "GET /blog/0008-after-very-careful-consideration-ive-decided-your-support-sucks.html_hide HTTP/1.0" 404 4033 "http://planet.debian.org/deriv/"; "Mozilla/5.0 (compatible; heritrix/1.14.4 +http://susanszky.tatk.elte.hu)"
/var/log/apache2/sprezzatech.log:157.181.181.71 - - [09/Jun/2012:08:23:51 -0500] "GET /blog/0008-after-very-careful-consideration-ive-decided-your-support-sucks.html HTTP/1.0" 200 14917 "http://planet.debian.org/deriv/"; "Mozilla/5.0 (compatible; heritrix/1.14.4 +http://susanszky.tatk.elte.hu)"
[li170-29](0) $ 


-- 
                                    nick black <nickblack@linux.com>
                 http://www.sprezzatech.com -- unix and hpc consulting
  to make an apple pie from scratch, you need first invent a universe.

Attachment: signature.asc
Description: Digital signature


Reply to: