Bug#707770: sosospider+gitweb caused apache memory use to balloon and not go back down

Package: apache2.2-common
Version: 2.2.22-13
Severity: normal

See attached graph.png. The 1+ gb memory plateau is due to apache, which 
should normally be using more like 10 mb. I noticed this, and restarted
it. A few hours later it happened again. At that point, I was using
mpm-worker; I switched to mpm-prefork, and made each process only serve
1000 requests. Shortly after, it happened again. Note the abrupt slope;
this is no slow leak.

My server only serves static files and runs a few cgi scripts. No php

The problem turned out to be caused by "sosospider", a Chinese web
spider, which apparently ignores robots.txt[1]. It wandered into my gitweb
(which is of course blocked from being spidered by robots.txt),
and proceeded to try to download multiple tarballs snapshots of
a 300 mb git repository at once.

git.kitenet.net - - [11/May/2013:01:22:40 -0400] "GET /?p=avianaquamiser.com;a=snapshot;h=4612ba9206187b86d1403b641dbc5fa00af19d93;sf=tgz HTTP/1.1" 200 142434304 "http://git.kitenet.net/"; "Mozilla/5.0 (compatible; Sosospider/2.0; +http://help.soso.com/webspider.htm)"
git.kitenet.net - - [11/May/2013:01:23:10 -0400] "GET /?p=avianaquamiser.com;a=snapshot;h=70adc391bbf0e96b0b7ed021852817c372ca7b8f;sf=tgz HTTP/1.1" 200 - "http://git.kitenet.net/"; "Mozilla/5.0 (compatible; Sosospider/2.0; +http://help.soso.com/webspider.htm)"
git.kitenet.net - - [11/May/2013:01:23:01 -0400] "GET /?p=avianaquamiser.com;a=snapshot;h=61401a7e229fb16878be9602d38032da05db1f90;sf=tgz HTTP/1.1" 200 7839744 "http://git.kitenet.net/"; "Mozilla/5.0 (compatible; Sosospider/2.0; +http://help.soso.com/webspider.htm)"
git.kitenet.net - - [11/May/2013:01:23:30 -0400] "GET /?p=avianaquamiser.com;a=snapshot;h=7fde76d445040bc8cd2313d283d63fd1a955963e;sf=tgz HTTP/1.1" 200 2695168 "http://git.kitenet.net/"; "Mozilla/5.0 (compatible; Sosospider/2.0; +http://help.soso.com/webspider.htm)"

This causes gitweb to be very active, but somehow this also makes
apache's memory use balloon up quite high. In top, I saw multiple
apache processes over 50 mb each. Prefork was a worse choice; server
became nonresponsive and I had to reboot it.

I have configured gitweb with $feature{'snapshot'}{'default'} = [];
and blacklisted this spider's address space, so I hope I will not
see this again. 

I don't understand why apache is using all that memory. Could it be
trying to buffer the cgi's output? If apache mallocs a lot of memory for
such a buffer, will it ever free it? Perhaps sosospider is doing
additional evil things beyond ignoring robots.txt, that cause this

IMHO gitweb should not come configured this way by default,
but the apache behavior is especially concerning.

[1] The spider's website claims
    but I get the impression from looking up this spider that they're lying
    or incompetant. My robots.txt file has been in place for 5 years.

-- Package-specific info:
List of enabled modules from 'apache2 -M':
  alias auth_basic authn_file authz_default authz_groupfile
  authz_host authz_user autoindex cgi deflate dir env expires include
  mime negotiation reqtimeout rewrite setenvif status userdir

-- System Information:
Debian Release: 7.0
  APT prefers unstable
  APT policy: (500, 'unstable'), (500, 'testing')
Architecture: i386 (i686)

Kernel: Linux 3.2.0-4-686-pae (SMP w/2 CPU cores)
Locale: LANG=en_US.utf8, LC_CTYPE=en_US.utf8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash

Versions of packages apache2 depends on:
ii  apache2-mpm-prefork  2.2.22-13
ii  apache2.2-common     2.2.22-13

apache2 recommends no packages.

apache2 suggests no packages.

Versions of packages apache2.2-common depends on:
ii  apache2-utils  2.2.22-13
ii  apache2.2-bin  2.2.22-13
ii  lsb-base       4.1+Debian9
ii  mime-support   3.52-2
ii  perl           5.14.2-20
ii  procps         1:3.3.4-2

Versions of packages apache2.2-common recommends:
ii  ssl-cert  1.0.32

Versions of packages apache2.2-common suggests:
pn  apache2-doc                     <none>
ii  apache2-suexec                  2.2.22-13
ii  chromium [www-browser]          25.0.1364.160-1
ii  epiphany-browser [www-browser]  3.4.2-2.1
ii  iceweasel [www-browser]         10.0.12esr-1+nmu1
ii  konqueror [www-browser]         4:4.8.4-2
ii  lynx-cur [www-browser]          2.8.8dev.15-2
ii  w3m [www-browser]               0.5.3-8

-- no debconf information

see shy jo

