[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#821313: apache2-data: Remove links in default site page to manpages.debian.org



Package: apache2-data
Version: 2.4.20-1
Severity: normal
Tags: patch

Dear maintainer,

Apache2 default site page includes links to manpages.debian.org. This is not a
very good idea since many sites are left unconfigured by default and there are
many (badly programmed) robots roaming the Internet and indexing sites.

Last Monday 11th, DSA had to disable the 'manpages.debian.org' vhost service in
glinka.debian.org because it was consuming continuously a large amount of CPU
and affecting other services.

Upon investigation, we have found that the service is being queried constantly
for the following pages: (a2ensite, a2dissite, a2enmod, a2dismod, and
a2ensite).  The number of daily queries have ranged from 6000 to 11000 thousand
and, starting May 8th, this has spiked to 93.000 to 141.000 daily queries!
(you can see the details in the attached text file)

These queries are distributed, in a single day we have identified at least 590
distinct hosts making them based on at least 309 misconfigured web servers.

The culprit seems to be some strange script (programmed in GO, since the user
agent is 'Go-http-client/1.1') which looks for websites and traverses them.
When they hits sites like http://teplosnab24.ru/ they start traversing all
URLs, including external connections.

We have enhanced the service configuration used so that we can withstand the
excess of (useless) queries for these manpages (as described in [1]).

The issue does not exactly lie on the apache2-data current page, as these are
scripts that are going awry, but this page is the "detonator" that has translated
this problem into a service problem.

Both DSA and I believe that the Apache2 default configuration should avoid
this misbehaviour by not including links to external sites.  Please find
attached a patch that removes those links from the index.html page which is
added by default to all Apache sites installed in Debian.

Alternatively, if you consider the manual pages to be useful, I would suggest
they are included (in HTML format) as part of the Apache2-data package itself instead of 
linking to the external manpages.debian.org service.

This change will at least prevent our service from getting hammered by these
misconfigured robots.

Thanks for your help,


Javier Fernandez-Sanguino


[1] https://lists.debian.org/debian-doc/2016/04/msg00055.html



-- System Information:
Debian Release: stretch/sid
  APT prefers unstable
  APT policy: (500, 'unstable')
Architecture: i386 (i686)

Kernel: Linux 4.4.0-1-686-pae (SMP w/4 CPU cores)
Locale: LANG=es_ES.utf8, LC_CTYPE=es_ES.utf8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash
Init: systemd (via /run/systemd/system)
--- index.html.orig	2016-04-17 16:41:46.000000000 +0200
+++ index.html	2016-04-17 16:42:41.000000000 +0200
@@ -293,17 +293,17 @@
                            *-available/ counterparts. These should be managed
                            by using our helpers
                            <tt>
-                                <a href="http://manpages.debian.org/cgi-bin/man.cgi?query=a2enmod";>a2enmod</a>,
-                                <a href="http://manpages.debian.org/cgi-bin/man.cgi?query=a2dismod";>a2dismod</a>,
+                                a2enmod,
+                                a2dismod,
                            </tt>
                            <tt>
-                                <a href="http://manpages.debian.org/cgi-bin/man.cgi?query=a2ensite";>a2ensite</a>,
-                                <a href="http://manpages.debian.org/cgi-bin/man.cgi?query=a2dissite";>a2dissite</a>,
+                                a2ensite,
+                                a2dissite,
                             </tt>
                                 and
                            <tt>
-                                <a href="http://manpages.debian.org/cgi-bin/man.cgi?query=a2enconf";>a2enconf</a>,
-                                <a href="http://manpages.debian.org/cgi-bin/man.cgi?query=a2disconf";>a2disconf</a>
+                                a2enconf,
+                                a2disconf
                            </tt>. See their respective man pages for detailed information.
                         </li>
 
Logs of queries to the manpages.debian.org service associated with Apache2
manual pages. 

The list below marks the number of access and the HTTP answer codes returned.

This information has been extracted by running, in glinka's /var/log/apache2,
the following code:

~ (for i in manpages.debian.org-access.log-*gz ; do echo -n "$i: "; zgrep "query=a2" $i | wc -l; zcat $i |grep "query=a2" | awk '{print $12" "$9}' | sort | uniq -c | sort -nr  | head -5 ; done ) 2>&1 

---------------------------------------------------------------------------------------------------------------------
manpages.debian.org-access.log-20160331.gz: 9467
   9464 "-" 200
      3 "-" 304
manpages.debian.org-access.log-20160401.gz: 9582
   9578 "-" 200
      3 "-" 304
      1 "-" 206
manpages.debian.org-access.log-20160402.gz: 11784
  11783 "-" 200
      1 "-" 304
manpages.debian.org-access.log-20160403.gz: 15585
  15582 "-" 200
      2 "-" 304
      1 "-" 206
manpages.debian.org-access.log-20160404.gz: 6705
   6704 "-" 200
      1 "-" 304
manpages.debian.org-access.log-20160405.gz: 8657
   8652 "-" 200
      5 "-" 304
manpages.debian.org-access.log-20160406.gz: 9979
   9971 "-" 200
      8 "-" 304
manpages.debian.org-access.log-20160407.gz: 8334
   8330 "-" 200
      3 "-" 304
      1 mini.com/" 200
manpages.debian.org-access.log-20160408.gz: 93729
  93617 "-" 200
     90 "-" 500
     16 "-" 504
      2 "-" 304
      2 "-" 206
manpages.debian.org-access.log-20160409.gz: 141661
 141660 "-" 200
      1 "-" 304
manpages.debian.org-access.log-20160410.gz: 140425
 140423 "-" 200
      2 "-" 304
manpages.debian.org-access.log-20160411.gz: 140878
 138953 "-" 200
   1840 "-" 504
     82 "-" 500
      3 "-" 304
manpages.debian.org-access.log-20160412.gz: 73254
  73157 "-" 200
     68 "-" 504
     27 "-" 500
      2 "-" 304
manpages.debian.org-access.log-20160416.gz: 100905
  53898 "-" 301
  42356 "Go-http-client/1.1" 200
   2811 "-" 200
   1154 "Mozilla/5.0 200
    204 "Mozilla/4.0 200

Reply to: