Re: Too many open files
If you've got quite reproducible error,
and where you can fire off a process/program,
and see/trigger the error, but not otherwise (easily)
isolating it, I'd suggest use of strace(1) to gather more relevant information.
Here's a rather contrived example, where I set the limit(s) quite low,
then go to open a fair number of files, just to explicitly trigger such
an error, and my capturing it with strace and (at least some of)
those results (may not show it all, as that capture may get quite long).
So ...
$ (n=5; strace -fv -eall -s2048 -o strace.out bash -c 'echo $(ulimit
-H -n; ulimit -S -n); ulimit -n '"$n"'; echo $(ulimit -H -n; ulimit -S
-n); 3>three 4>four 5>five 6>six 7>seven /bin/echo x; echo $?')
4096 1024
5 5
bash: line 1: five: Too many open files
1
$
And in part from our captured strace data that I saved in strace.out,
for brevity, I'll omit many lines, and may also truncate many lines,
and my comments on lines starting with "// ":
31194 prlimit64(0, RLIMIT_NOFILE, NULL, {rlim_cur=1024, rlim_max=4*1024}) = 0
// hard and soft limits of 1024 and 4096, we change them to 5:
31194 prlimit64(0, RLIMIT_NOFILE, {rlim_cur=5, rlim_max=5}, NULL) = 0
// and retrieve that information:
31196 prlimit64(0, RLIMIT_NOFILE, NULL, {rlim_cur=5, rlim_max=5}) = 0
31196 prlimit64(0, RLIMIT_NOFILE, NULL, {rlim_cur=5, rlim_max=5}) = 0
31197 openat(AT_FDCWD, "three", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 3
31197 openat(AT_FDCWD, "four", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 4
// we have fds (file descriptors) 0,1,2 by default, we open 3 and 4
okay for total of 5,
// then fail when we try to open one more:
31197 openat(AT_FDCWD, "five", O_WRONLY|O_CREAT|O_TRUNC, 0666) = -1
EMFILE (Too many open files)
31197 write(2, "bash: line 1: five: Too many open files\n", 40) = 40
So, if you have program that quickly and quite reproducibly fails, you
could do something like:
$ ulimit -H -n; ulimit -S -n; strace -fv -eall -s2048 -o strace.out
your_program optionally_options_and_or_arguments
Could also launch that in background by sticking & on the end.
Once you've seen the error, that will generally be enough, and it's
likely captured in the output of strace(1).
Can also greatly reduce strace's output by restricting it to calls of interest,
e.g. rather than -eall
instead use
-edup,dup2,getrlimit,open,openat,prlimit64,setrlimit
It's the getrlimit/setrlimit/prlimit family of calls we're interested
in for getting/setting limits,
and the dup/open family of calls for opening files (or copying their
descriptors, thus creating an
additional file descriptor), and most notably where they fail due to
limits (or anything else that may
be quite unexpected). And including write family of calls can be
useful, though verbose/voluminous,
notably as one can generally use that to find first occurrence of the
error diagnostic being written,
then go backwards from there to look at the calls of more particular interest.
Anyway, not sure what's happening in your case - may be excessive
number of descriptors/files open,
or limit somehow got/set too low, or possibly something else
triggering that failure. In any case,
answer(s) somewhere in there to be found.
On Mon, Aug 18, 2025 at 9:52 PM Ken Mankoff <mankoff@gmail.com> wrote:
>
> Hi Michael, List,
>
> I apologize for re-sending the message, and now responding to my own not your reply. My initial message was sent when I was not subscribed to the list. I was looking at the archive website and did not see it show up there (I was looking in the wrong place) so I subscribed and re-sent. Hence it appearing 2x. I'm now subscribed and should see replies and be able to reply to them properly.
>
> ulimit -Hn and -Sn both report 32768, so that's not it.
>
> > Peeking on my fairly busy 12 Bookworm host ...
> > # 2>>/dev/null ls -d /proc/[0-9]*/fd/* | wc -l
> > 2171
>
> $ 2>>/dev/null ls -d /proc/[0-9]*/fd/* | wc -l
> 5503
>
> $ cat /proc/sys/fs/file-max
> 1000000
>
> Also seen at
>
> $ cat /proc/sys/fs/file-nr
> 25312 0 1000000
>
> Seems reasonable so far, and nothing obvious (to me) in dmesg.
>
> journalctl shows tons of errors (OpenGL, X11 even though I'm on Wayland, Qt, etc.) and shows it occurring more than the few times I see it from some terminal commands that I run. But nothing is crashing because of it as far as I can tell.
>
> > $ dolphin . kf.solid.backends.fstab: Failed to acquire watch file
> > descriptor Too many open files
>
> Dolphin still opens.
>
> > $ tail -f some_big_file tail: inotify cannot be used, reverting to
> > polling: Too many open files
>
> Tail still works.
>
>
> Your suggestion to look at /proc/*/limits shows:
> $ cat /proc/[0-9]*/limits 2>>/dev/null | sed -ne '1p;/^Max open files/p' | sort | uniq -c | sed -e 's/ *$//;s/ //' | sort -k 5bn
> 1 Limit Soft Limit Hard Limit Units
> 1 Max open files 50 50 files
> 268 Max open files 1024 4096 files
> 29 Max open files 1024 524288 files
> 100 Max open files 4096 524288 files
> 6 Max open files 8192 524288 files
> 22 Max open files 32768 32768 files
> 1 Max open files 65535 524288 files
> 19 Max open files 65536 524288 files
> 3 Max open files 65536 65536 files
> 2 Max open files 131072 131072 files
> 5 Max open files 524287 524288 files
> 29 Max open files 524288 524288 files
> 17 Max open files 1000000 1000000 files
> 3 Max open files 1073741816 1073741816 files
>
> Looking at that last line:
>
> $ grep 1073741816 /proc/[0-9]*/limits
> /proc/1644/limits:Max open files 1073741816 1073741816 files
> /proc/1/limits:Max open files 1073741816 1073741816 files
> /proc/770/limits:Max open files 1073741816 1073741816 files
>
> $ ps aux | grep -E " 1644 | 770 "
> root 770 0.0 0.0 37164 11844 ? Ss Aug14 0:32 /usr/lib/systemd/systemd-udevd
> root 1644 0.0 0.0 2467416 53796 ? Ssl Aug14 3:13 /usr/bin/containerd
>
>
> But I'm not sure what to do with this information...
>
> Thanks for your help so far,
>
> -k.
Reply to: