[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#707770: marked as done (sosospider+gitweb caused apache memory use to balloon and not go back down)



Your message dated Sat, 28 May 2016 17:27:00 +0200 (CEST)
with message-id <alpine.DEB.2.11.1605281726300.9946@eru.sfritsch.de>
and subject line Re: Bug#707770: sosospider+gitweb caused apache memory use to balloon and not go back down
has caused the Debian Bug report #707770,
regarding sosospider+gitweb caused apache memory use to balloon and not go back down
to be marked as done.

This means that you claim that the problem has been dealt with.
If this is not the case it is now your responsibility to reopen the
Bug report if necessary, and/or fix the problem forthwith.

(NB: If you are a system administrator and have no idea what this
message is talking about, this may indicate a serious mail system
misconfiguration somewhere. Please contact owner@bugs.debian.org
immediately.)


-- 
707770: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=707770
Debian Bug Tracking System
Contact owner@bugs.debian.org with problems
--- Begin Message ---
Package: apache2.2-common
Version: 2.2.22-13
Severity: normal

See attached graph.png. The 1+ gb memory plateau is due to apache, which 
should normally be using more like 10 mb. I noticed this, and restarted
it. A few hours later it happened again. At that point, I was using
mpm-worker; I switched to mpm-prefork, and made each process only serve
1000 requests. Shortly after, it happened again. Note the abrupt slope;
this is no slow leak.

My server only serves static files and runs a few cgi scripts. No php
etc.

The problem turned out to be caused by "sosospider", a Chinese web
spider, which apparently ignores robots.txt[1]. It wandered into my gitweb
(which is of course blocked from being spidered by robots.txt),
and proceeded to try to download multiple tarballs snapshots of
a 300 mb git repository at once.

git.kitenet.net 123.151.139.212 - - [11/May/2013:01:22:40 -0400] "GET /?p=avianaquamiser.com;a=snapshot;h=4612ba9206187b86d1403b641dbc5fa00af19d93;sf=tgz HTTP/1.1" 200 142434304 "http://git.kitenet.net/"; "Mozilla/5.0 (compatible; Sosospider/2.0; +http://help.soso.com/webspider.htm)"
git.kitenet.net 123.151.139.212 - - [11/May/2013:01:23:10 -0400] "GET /?p=avianaquamiser.com;a=snapshot;h=70adc391bbf0e96b0b7ed021852817c372ca7b8f;sf=tgz HTTP/1.1" 200 - "http://git.kitenet.net/"; "Mozilla/5.0 (compatible; Sosospider/2.0; +http://help.soso.com/webspider.htm)"
git.kitenet.net 123.151.139.212 - - [11/May/2013:01:23:01 -0400] "GET /?p=avianaquamiser.com;a=snapshot;h=61401a7e229fb16878be9602d38032da05db1f90;sf=tgz HTTP/1.1" 200 7839744 "http://git.kitenet.net/"; "Mozilla/5.0 (compatible; Sosospider/2.0; +http://help.soso.com/webspider.htm)"
git.kitenet.net 123.151.139.212 - - [11/May/2013:01:23:30 -0400] "GET /?p=avianaquamiser.com;a=snapshot;h=7fde76d445040bc8cd2313d283d63fd1a955963e;sf=tgz HTTP/1.1" 200 2695168 "http://git.kitenet.net/"; "Mozilla/5.0 (compatible; Sosospider/2.0; +http://help.soso.com/webspider.htm)"

This causes gitweb to be very active, but somehow this also makes
apache's memory use balloon up quite high. In top, I saw multiple
apache processes over 50 mb each. Prefork was a worse choice; server
became nonresponsive and I had to reboot it.

I have configured gitweb with $feature{'snapshot'}{'default'} = [];
and blacklisted this spider's address space, so I hope I will not
see this again. 

I don't understand why apache is using all that memory. Could it be
trying to buffer the cgi's output? If apache mallocs a lot of memory for
such a buffer, will it ever free it? Perhaps sosospider is doing
additional evil things beyond ignoring robots.txt, that cause this
behavior.

IMHO gitweb should not come configured this way by default,
but the apache behavior is especially concerning.

[1] The spider's website claims
    "在robots.txt中添加了禁止访问的规则后,sosospider即会遵循按规则停止相应的页面/站点抓取"
    but I get the impression from looking up this spider that they're lying
    or incompetant. My robots.txt file has been in place for 5 years.

-- Package-specific info:
List of enabled modules from 'apache2 -M':
  alias auth_basic authn_file authz_default authz_groupfile
  authz_host authz_user autoindex cgi deflate dir env expires include
  mime negotiation reqtimeout rewrite setenvif status userdir

-- System Information:
Debian Release: 7.0
  APT prefers unstable
  APT policy: (500, 'unstable'), (500, 'testing')
Architecture: i386 (i686)

Kernel: Linux 3.2.0-4-686-pae (SMP w/2 CPU cores)
Locale: LANG=en_US.utf8, LC_CTYPE=en_US.utf8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash

Versions of packages apache2 depends on:
ii  apache2-mpm-prefork  2.2.22-13
ii  apache2.2-common     2.2.22-13

apache2 recommends no packages.

apache2 suggests no packages.

Versions of packages apache2.2-common depends on:
ii  apache2-utils  2.2.22-13
ii  apache2.2-bin  2.2.22-13
ii  lsb-base       4.1+Debian9
ii  mime-support   3.52-2
ii  perl           5.14.2-20
ii  procps         1:3.3.4-2

Versions of packages apache2.2-common recommends:
ii  ssl-cert  1.0.32

Versions of packages apache2.2-common suggests:
pn  apache2-doc                     <none>
ii  apache2-suexec                  2.2.22-13
ii  chromium [www-browser]          25.0.1364.160-1
ii  epiphany-browser [www-browser]  3.4.2-2.1
ii  iceweasel [www-browser]         10.0.12esr-1+nmu1
ii  konqueror [www-browser]         4:4.8.4-2
ii  lynx-cur [www-browser]          2.8.8dev.15-2
ii  w3m [www-browser]               0.5.3-8

-- no debconf information

-- 
see shy jo

Attachment: graph.png
Description: PNG image

Attachment: signature.asc
Description: Digital signature


--- End Message ---
--- Begin Message ---
This is likely fixed in 2.4. Closing the bug.

--- End Message ---

Reply to: