[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

fork systemcall failing on "high" server load



Hello,

I'd like to submit you a problem involving the fork system call.




I have a Samba 2.2.x server (official package from Debian Woody) which
hosts Windows user home directories. There can be around 100 clients
using the server at the same time.

Sometimes, usually when lots of people are using the samba server, the
server would refuse new connections. If I try to login on the console at
the same time, I get the following error message :

bash: fork: Ressource temporairement indisponible
(in English : bash: fork: Ressource temporarily unavailable)

and it is not possible for new clients to access the home samba share,
what happens almost every day.

If I login as root while the network activity is low and wait until the
server starts refusing new client connections again, I get the same kind
of errors :

# cat /proc/sys/fs/file-nr
2050    3       32000
# ps aux
bash: fork: Ressource temporairement indisponible
# ls
bash: fork: Ressource temporairement indisponible
# which
bash: fork: Ressource temporairement indisponible
# jobs
# kill
kill: usage: kill [-s sigspec | -n signum | -sigspec] [pid | job]... or
kill -l [sigspec]
# ls
bash: fork: Ressource temporairement indisponible


I had a look at the implementation of the fork syscall on linux. In
kernel/fork.c, function do_fork (line 633 on linux 2.4.21) stands :
 * Check if we are over our maximum process limit, but be sure to
 * exclude root.

So I assumed the problem was related neither to pam_limits nor ulimit
process limitations.


It seems, that samba starts a new root process and a new user process
for each new connection, und soon enough, I have around a 100 processes
on the server.

I already set the kernel file-max (integer) to 32760 with sysctl.

In the kernel source, I figured that the fork systemcall could fail
because of nr_threads being too low. But on the server, the /proc value
for threads-max lies between 8000 and 9000.

I had a cron job running on the server and logging out the output of w,
free, vmstat and ps aux every 5 minutes to a file. I attached to the
email the log corresponding to the time interval 14:05 - 15:55. Between
14:15 and 14:30, at 14:35 and 14:45 the script didn't output anything to
the log file probably because it couldn't be called (fork: ressource
unavailable). As you can see, there is still plenty of free memory space
and there aren't any defunct / zombie samba process.

Attached to this mail file are also the samba logs corresponding to the
time interval in which the server was unavailable.




Do you have any idea about what could make the fork fail ? Which
direction would you investigate ?


Thanking you very much in advance,


-- 
 Jean HAUSSER

Attachment: mail_samba.ll.log
Description: Cron job (running ps, w, free) output

Attachment: mail_samba.smbd.log
Description: Samba's smbd log file


Reply to: