[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Apache goes haywire, NSCD fixes it. Odd (solved) problem.




On Mon, 6 Oct 2003, Tyler Morgan wrote:

> Hi,
>
> I have recently surmounted an odd problem involving Apache 1.3.27.0-2
> on Debian testing/unstable (with a 2.4.22 kernel) and thought it may be
> worthwhile to explain what happened to a group of people who might care,
> in hopes that others find the information useful.
>
> A few days ago my machine, which is a Dell P4 2.4 GHz box (pretty generic),
> started experiencing extremely high load averages and CPU usage. A couple of
> apache processes (anywhere from 1 to 5) had decided to eat 100% of CPU and
> 100% of RAM (1G physical 1G swap) between themselves (evenly distributed)
> for no apparent reason. Restarting apache would temporarily fix these
> runaway processes for anywhere from 30 seconds to 12 hours. Typically it
> would start up again within 10 or 15 minutes.
>
> Not being able to figure out what the hell these processes were doing by
> examining logs and the server-status page, I started stracing the runaway
> pids. The output looked like this:
>
> <-- strace output -->
> # strace -p <pid of a runaway apache>
> close(7)                                = 0
> mremap(0x44735000, 304750592, 304754688, MREMAP_MAYMOVE) = 0x44735000
> open("/etc/passwd", O_RDONLY)           = 7
> fcntl64(7, F_GETFD)                     = 0
> fcntl64(7, F_SETFD, FD_CLOEXEC)         = 0
> _llseek(7, 0, [0], SEEK_CUR)            = 0
> fstat64(7, {st_mode=S_IFREG|0644, st_size=4158, ...}) = 0
> mmap2(NULL, 4158, PROT_READ, MAP_SHARED, 7, 0) = 0x403f4000
> _llseek(7, 4158, [4158], SEEK_SET)      = 0
> fstat64(7, {st_mode=S_IFREG|0644, st_size=4158, ...}) = 0
> munmap(0x403f4000, 4158)                = 0
> close(7)                                = 0
> <repeat>
> <-- end strace -->
>
> That would happen over and over about a hundred thousand times until the
> system ran out of memory and the process was killed by the kernel.
>
> I was stuck at this point. Why the hell is apache freaking out and eating
> tons of CPU while attempting to open /etc/passwd? Who knows? You wouldn't
> believe the amount of troubleshooting I did. New kernels, new source
> compiled apache binaries with default configurations, confirmed md5 sums
> of all apache binaries and linked libraries against a problem-free machine,
> a complete hardware swap, and even a run with apache-ssl. I tried everything
> and this was still happening.
>
> About to give up hope, I started looking at strace output again. I didn't
> notice this before:
> <-- more strace -->
> connect(7, {sa_family=AF_UNIX, path="/var/run/.nscd_socket"}, 110) = -1 \
> ENOENT (No such file or directory)
> close(7)                                = 0
> open("/etc/passwd", O_RDONLY)           = 7
> <-- end strace -->
>
> Hmm.. It.s trying to connect to NSCD before trying (and failing) to open
> /etc/passwd. It only was trying NSCD about once every 50 attempts.
>
> Not knowing what NSCD even is I Googled around and found out it provides
> cached passwd lookups (among other things). That sounded promising!
> Installed NSCD and, well, look at the graphs yourself:
>
> http://discore.org/cpusum-day.png
> (Daily CPU usage % updated every 5 min)
>
> NSCD was installed just after 3PM and the problem magically disappeared.
>
> Bug? Maybe. Error on my part? More likely. Odd? I think so. Fun? Not at all.
>
> I still don't know why apache couldn't just open /etc/passwd to get the
> information that NSCD provides more readily. The permissions are fine
> and nothing but apache is having problems.
>
> I'd be available to do further testing of this if anyone is interested.
> Mostly the point of this post is just to provide information on something
> that happened to me and was fixed quite easily after far too much
> troubleshooting.
>
> Thanks for reading,
> Tyler Morgan
Is your apache serving static pages or dynamic? What modules is it using?
Can you think of anything that need authentication (checking a password)?
This may have fixed your problem but I don't thing it was the real
solution.
-K



Reply to: