[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Too many open files



If you've got quite reproducible error,
and where you can fire off a process/program,
and see/trigger the error, but not otherwise (easily)
isolating it, I'd suggest use of strace(1) to gather more relevant information.

Here's a rather contrived example, where I set the limit(s) quite low,
then go to open a fair number of files, just to explicitly trigger such
an error, and my capturing it with strace and (at least some of)
those results (may not show it all, as that capture may get quite long).

So ...

$ (n=5; strace -fv -eall -s2048 -o strace.out bash -c 'echo $(ulimit
-H -n; ulimit -S -n); ulimit -n '"$n"'; echo $(ulimit -H -n; ulimit -S
-n); 3>three 4>four 5>five 6>six 7>seven /bin/echo x; echo $?')
4096 1024
5 5
bash: line 1: five: Too many open files
1
$
And in part from our captured strace data that I saved in strace.out,
for brevity, I'll omit many lines, and may also truncate many lines,
and my comments on lines starting with "// ":
31194 prlimit64(0, RLIMIT_NOFILE, NULL, {rlim_cur=1024, rlim_max=4*1024}) = 0
// hard and soft limits of 1024 and 4096, we change them to 5:
31194 prlimit64(0, RLIMIT_NOFILE, {rlim_cur=5, rlim_max=5}, NULL) = 0
// and retrieve that information:
31196 prlimit64(0, RLIMIT_NOFILE, NULL, {rlim_cur=5, rlim_max=5}) = 0
31196 prlimit64(0, RLIMIT_NOFILE, NULL, {rlim_cur=5, rlim_max=5}) = 0
31197 openat(AT_FDCWD, "three", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 3
31197 openat(AT_FDCWD, "four", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 4
// we have fds (file descriptors) 0,1,2 by default, we open 3 and 4
okay for total of 5,
// then fail when we try to open one more:
31197 openat(AT_FDCWD, "five", O_WRONLY|O_CREAT|O_TRUNC, 0666) = -1
EMFILE (Too many open files)
31197 write(2, "bash: line 1: five: Too many open files\n", 40) = 40

So, if you have program that quickly and quite reproducibly fails, you
could do something like:
$ ulimit -H -n; ulimit -S -n; strace -fv -eall -s2048 -o strace.out
your_program optionally_options_and_or_arguments
Could also launch that in background by sticking & on the end.
Once you've seen the error, that will generally be enough, and it's
likely captured in the output of strace(1).
Can also greatly reduce strace's output by restricting it to calls of interest,
e.g. rather than -eall
instead use
-edup,dup2,getrlimit,open,openat,prlimit64,setrlimit

It's the getrlimit/setrlimit/prlimit family of calls we're interested
in for getting/setting limits,
and the dup/open family of calls for opening files (or copying their
descriptors, thus creating an
additional file descriptor), and most notably where they fail due to
limits (or anything else that may
be quite unexpected).  And including write family of calls can be
useful, though verbose/voluminous,
notably as one can generally use that to find first occurrence of the
error diagnostic being written,
then go backwards from there to look at the calls of more particular interest.

Anyway, not sure what's happening in your case - may be excessive
number of descriptors/files open,
or limit somehow got/set too low, or possibly something else
triggering that failure.  In any case,
answer(s) somewhere in there to be found.

On Mon, Aug 18, 2025 at 9:52 PM Ken Mankoff <mankoff@gmail.com> wrote:
>
> Hi Michael, List,
>
> I apologize for re-sending the message, and now responding to my own not your reply. My initial message was sent when I was not subscribed to the list. I was looking at the archive website and did not see it show up there (I was looking in the wrong place) so I subscribed and re-sent. Hence it appearing 2x. I'm now subscribed and should see replies and be able to reply to them properly.
>
> ulimit -Hn and -Sn both report 32768, so that's not it.
>
> > Peeking on my fairly busy 12 Bookworm host ...
> > # 2>>/dev/null ls -d /proc/[0-9]*/fd/* | wc -l
> > 2171
>
> $ 2>>/dev/null ls -d /proc/[0-9]*/fd/* | wc -l
> 5503
>
> $ cat /proc/sys/fs/file-max
> 1000000
>
> Also seen at
>
> $ cat /proc/sys/fs/file-nr
> 25312   0       1000000
>
> Seems reasonable so far, and nothing obvious (to me) in dmesg.
>
> journalctl shows tons of errors (OpenGL, X11 even though I'm on Wayland, Qt, etc.) and shows it occurring more than the few times I see it from some terminal commands that I run. But nothing is crashing because of it as far as I can tell.
>
> > $ dolphin . kf.solid.backends.fstab: Failed to acquire watch file
> > descriptor Too many open files
>
> Dolphin still opens.
>
> > $ tail -f some_big_file tail: inotify cannot be used, reverting to
> > polling: Too many open files
>
> Tail still works.
>
>
> Your suggestion to look at /proc/*/limits  shows:
> $ cat /proc/[0-9]*/limits 2>>/dev/null | sed -ne '1p;/^Max open files/p' | sort | uniq -c | sed -e 's/  *$//;s/  //' | sort -k 5bn
>     1 Limit                     Soft Limit           Hard Limit           Units
>     1 Max open files            50                   50                   files
>   268 Max open files            1024                 4096                 files
>    29 Max open files            1024                 524288               files
>   100 Max open files            4096                 524288               files
>     6 Max open files            8192                 524288               files
>    22 Max open files            32768                32768                files
>     1 Max open files            65535                524288               files
>    19 Max open files            65536                524288               files
>     3 Max open files            65536                65536                files
>     2 Max open files            131072               131072               files
>     5 Max open files            524287               524288               files
>    29 Max open files            524288               524288               files
>    17 Max open files            1000000              1000000              files
>     3 Max open files            1073741816           1073741816           files
>
> Looking at that last line:
>
> $ grep 1073741816 /proc/[0-9]*/limits
> /proc/1644/limits:Max open files            1073741816           1073741816           files
> /proc/1/limits:Max open files            1073741816           1073741816           files
> /proc/770/limits:Max open files            1073741816           1073741816           files
>
> $ ps aux | grep -E " 1644 | 770 "
> root         770  0.0  0.0  37164 11844 ?        Ss   Aug14   0:32 /usr/lib/systemd/systemd-udevd
> root        1644  0.0  0.0 2467416 53796 ?       Ssl  Aug14   3:13 /usr/bin/containerd
>
>
> But I'm not sure what to do with this information...
>
> Thanks for your help so far,
>
>   -k.


Reply to: