Bug#382390: lpq -Pall fails for many printers when :filter is used
Package: lprng
Version: 3.8.28-2
(I suppose this is an upstream bug, but I havent't found an upstream
place to report this. That's why I use Debian's Bug Tracking System to
document it.)
I tried to trace a problem where "lpq -Pall" stops output before it
it has listed all printers. I made a trimmed-down way to reproduce the
problem:
Create the printcap - run mkprintcap >/etc/printcap
---mkprintcap---
#!/bin/sh
cat <<EOF
.default
# spool directory
:sd=/var/spool/lpd/%P
# (no idea what needs translate_format)
:translate_format=
# maximum job size: unlimited
:mx=0
# suppress headers and/or banner page
:sh
# maximum copies allowed
:mc=1
# preserving already printed jobs: no more than ...
:done_jobs=5
# ... and no longer than ...
# :done_jobs_max_age=7200
:done_jobs_max_age=43200
# stop when less than 10 GB disk space are free
:minfree=10000000
# time between jobs to allow printer recovery
:connect_grace=10
# time after which a job is reported as stalled
:stalled_time=600
EOF
for i in $(seq 1 10000)
do
echo p$i
echo " :tc=.default"
echo " :lp=h$i%9100"
echo " :filter=-$ /usr/local/bin/printer_filter gs hp2k3d"
echo
done
---end-of-mkprintcap
This creates a printcap with a default template (see above script) and
10000 numbered printers:
p1
:tc=.default
:lp=h1%9100
:filter=$ /usr/local/bin/printer_filter gs hp2k3d
p2
...
Then make an executable /usr/local/bin/printer_filter:
ln -s /bin/true /usr/local/bin/printer_filter
This is only needed to make checkpc -f happy.
Now run checkpc -f, this creates the queue directories.
Stop any existing lpd and run a new lpd in foreground:
lpd -F
In a second shell run some lpq, like:
lpq -v -Pall |grep "^Printer: "| wc -l
These outputs result:
# lpd -F
2006-08-10-17:49:13.824 printserv p1019: Make_temp_fd_in_dir: cannot create tempfile '/var/spool/lpd/p1019/temp00zS8ve8'
#
$ lpq -v -Pall |grep "^Printer: "| wc -l
1018
I tried some variations in order to learn more about the problem. It
looks like it doesn't matter whether you use -v, -s or no option for
lpq. The problem completely vanishes when you omit :filter in
above printcap entries.
lpd can give you some more debug info:
lpd -F -D 3 >lpd.log
The supposedly relevant output:
2006-08-10-18:07:29.110 printserv [9985] Job_status p1019: Checkread: file 'lpq.1'
2006-08-10-18:07:29.111 printserv [9985] Job_status p1019: Checkread: cannot open 'lpq.1', Too many open files
2006-08-10-18:07:29.111 printserv [9985] Job_status p1019: Checkread: 'lpq.1' fd -1, size 0
2006-08-10-18:07:29.111 printserv [9985] Job_status p1019: Init_tempfile: temp file '/var/spool/lpd/p1019'
2006-08-10-18:07:29.111 printserv [9985] Job_status p1019: setstatus: msg 'p1019: Make_temp_fd_in_dir: cannot create tempfile '/var/spool/lpd/p1019/temp0
0sLselj''
2006-08-10-18:07:29.111 printserv [9985] Job_status p1019: Trim_status_file: file 'status.pr' max 10, min 0
2006-08-10-18:07:29.111 printserv [9985] Job_status p1019: Trim_status_file: 'status.pr' max 10, min 0, size 0
2006-08-10-18:07:29.111 printserv [9985] Job_status p1019: Checkwrite: file 'status.pr', rw 0, create 0, nodelay 0
2006-08-10-18:07:29.111 printserv [9985] Job_status p1019: Checkwrite: cannot open 'status.pr', Too many open files
2006-08-10-18:07:29.111 printserv [9985] Job_status p1019: Checkwrite: file 'status.pr' fd -1, inode 0x0, perms 00
2006-08-10-18:07:29.111 printserv [9985] Job_status (INFO) p1019: Make_temp_fd_in_dir: cannot create tempfile '/var/spool/lpd/p1019/temp00sLselj'
2006-08-10-18:07:29.111 printserv [9985] Job_status p1019: cleanup: signal No signal, Errorcode 32
2006-08-10-18:07:29.115 printserv [9985] Job_status p1019: Killchildren: pid 9985, signal Interrupt, count 0
Oh, up to now I didn't see the "Too many open files"... When I raise
the limit with "ulimit -n 10000" it works up to printer p9995.
Looking in lpd's /proc/<pid>/fd reveals the files that eat the fd budget:
...
lrwx------ 1 root root 64 Aug 10 18:10 994 -> /var/spool/lpd/p990/lpq
lrwx------ 1 root root 64 Aug 10 18:10 995 -> /var/spool/lpd/p991/lpq
lrwx------ 1 root root 64 Aug 10 18:10 996 -> /var/spool/lpd/p992/lpq
lrwx------ 1 root root 64 Aug 10 18:10 997 -> /var/spool/lpd/p993/lpq
lrwx------ 1 root root 64 Aug 10 18:10 998 -> /var/spool/lpd/p994/lpq
lrwx------ 1 root root 64 Aug 10 18:10 999 -> /var/spool/lpd/p995/lpq
(In order to make lpd run slow enough use -D 3.)
So the workaround is to increase the fd budget with ulimit.
On the other hand this is the place where I don't know how to make a
real fix: I guess the lpq file should be closed before the one for the
next printer is opened, but I couldn't find these parts in the LPRng
source. And it is still magic why this does not happen when the
:filter statements are omitted. And sometimes after the first
invocation of a fresh lpd the problem does not appear. No idea why.
Hope this helps when someone runs into the same problem.
Sven
Reply to: