Bug#821313: apache2-data: Remove links in default site page to manpages.debian.org
Package: apache2-data
Version: 2.4.20-1
Severity: normal
Tags: patch
Dear maintainer,
Apache2 default site page includes links to manpages.debian.org. This is not a
very good idea since many sites are left unconfigured by default and there are
many (badly programmed) robots roaming the Internet and indexing sites.
Last Monday 11th, DSA had to disable the 'manpages.debian.org' vhost service in
glinka.debian.org because it was consuming continuously a large amount of CPU
and affecting other services.
Upon investigation, we have found that the service is being queried constantly
for the following pages: (a2ensite, a2dissite, a2enmod, a2dismod, and
a2ensite). The number of daily queries have ranged from 6000 to 11000 thousand
and, starting May 8th, this has spiked to 93.000 to 141.000 daily queries!
(you can see the details in the attached text file)
These queries are distributed, in a single day we have identified at least 590
distinct hosts making them based on at least 309 misconfigured web servers.
The culprit seems to be some strange script (programmed in GO, since the user
agent is 'Go-http-client/1.1') which looks for websites and traverses them.
When they hits sites like http://teplosnab24.ru/ they start traversing all
URLs, including external connections.
We have enhanced the service configuration used so that we can withstand the
excess of (useless) queries for these manpages (as described in [1]).
The issue does not exactly lie on the apache2-data current page, as these are
scripts that are going awry, but this page is the "detonator" that has translated
this problem into a service problem.
Both DSA and I believe that the Apache2 default configuration should avoid
this misbehaviour by not including links to external sites. Please find
attached a patch that removes those links from the index.html page which is
added by default to all Apache sites installed in Debian.
Alternatively, if you consider the manual pages to be useful, I would suggest
they are included (in HTML format) as part of the Apache2-data package itself instead of
linking to the external manpages.debian.org service.
This change will at least prevent our service from getting hammered by these
misconfigured robots.
Thanks for your help,
Javier Fernandez-Sanguino
[1] https://lists.debian.org/debian-doc/2016/04/msg00055.html
-- System Information:
Debian Release: stretch/sid
APT prefers unstable
APT policy: (500, 'unstable')
Architecture: i386 (i686)
Kernel: Linux 4.4.0-1-686-pae (SMP w/4 CPU cores)
Locale: LANG=es_ES.utf8, LC_CTYPE=es_ES.utf8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash
Init: systemd (via /run/systemd/system)
--- index.html.orig 2016-04-17 16:41:46.000000000 +0200
+++ index.html 2016-04-17 16:42:41.000000000 +0200
@@ -293,17 +293,17 @@
*-available/ counterparts. These should be managed
by using our helpers
<tt>
- <a href="http://manpages.debian.org/cgi-bin/man.cgi?query=a2enmod">a2enmod</a>,
- <a href="http://manpages.debian.org/cgi-bin/man.cgi?query=a2dismod">a2dismod</a>,
+ a2enmod,
+ a2dismod,
</tt>
<tt>
- <a href="http://manpages.debian.org/cgi-bin/man.cgi?query=a2ensite">a2ensite</a>,
- <a href="http://manpages.debian.org/cgi-bin/man.cgi?query=a2dissite">a2dissite</a>,
+ a2ensite,
+ a2dissite,
</tt>
and
<tt>
- <a href="http://manpages.debian.org/cgi-bin/man.cgi?query=a2enconf">a2enconf</a>,
- <a href="http://manpages.debian.org/cgi-bin/man.cgi?query=a2disconf">a2disconf</a>
+ a2enconf,
+ a2disconf
</tt>. See their respective man pages for detailed information.
</li>
Logs of queries to the manpages.debian.org service associated with Apache2
manual pages.
The list below marks the number of access and the HTTP answer codes returned.
This information has been extracted by running, in glinka's /var/log/apache2,
the following code:
~ (for i in manpages.debian.org-access.log-*gz ; do echo -n "$i: "; zgrep "query=a2" $i | wc -l; zcat $i |grep "query=a2" | awk '{print $12" "$9}' | sort | uniq -c | sort -nr | head -5 ; done ) 2>&1
---------------------------------------------------------------------------------------------------------------------
manpages.debian.org-access.log-20160331.gz: 9467
9464 "-" 200
3 "-" 304
manpages.debian.org-access.log-20160401.gz: 9582
9578 "-" 200
3 "-" 304
1 "-" 206
manpages.debian.org-access.log-20160402.gz: 11784
11783 "-" 200
1 "-" 304
manpages.debian.org-access.log-20160403.gz: 15585
15582 "-" 200
2 "-" 304
1 "-" 206
manpages.debian.org-access.log-20160404.gz: 6705
6704 "-" 200
1 "-" 304
manpages.debian.org-access.log-20160405.gz: 8657
8652 "-" 200
5 "-" 304
manpages.debian.org-access.log-20160406.gz: 9979
9971 "-" 200
8 "-" 304
manpages.debian.org-access.log-20160407.gz: 8334
8330 "-" 200
3 "-" 304
1 mini.com/" 200
manpages.debian.org-access.log-20160408.gz: 93729
93617 "-" 200
90 "-" 500
16 "-" 504
2 "-" 304
2 "-" 206
manpages.debian.org-access.log-20160409.gz: 141661
141660 "-" 200
1 "-" 304
manpages.debian.org-access.log-20160410.gz: 140425
140423 "-" 200
2 "-" 304
manpages.debian.org-access.log-20160411.gz: 140878
138953 "-" 200
1840 "-" 504
82 "-" 500
3 "-" 304
manpages.debian.org-access.log-20160412.gz: 73254
73157 "-" 200
68 "-" 504
27 "-" 500
2 "-" 304
manpages.debian.org-access.log-20160416.gz: 100905
53898 "-" 301
42356 "Go-http-client/1.1" 200
2811 "-" 200
1154 "Mozilla/5.0 200
204 "Mozilla/4.0 200
Reply to: