Some confusion with flock.
I'm converting a (bash) script from using lockfiles to using flock but
I'm getting a strange effect that I don't understand.
FWIW, the old script used mkdir where I call flock and rmdir where I
close the fd. (I also used a trap to avoid stale lockfiles.
queue_cmd()
{
(
local lck=/etc/network/lock/${1##*/}
exec {FDQUEUE}>$lck.queue
flock -n ${FDQUEUE} || exit 0
exec {FDLOCK}>$lck.lock
while ! flock -n ${FDLOCK}; do
echo "Waiting for $lck.lock to be unlocked"
sleep 10
done
exec {FDQUEUE}>&-
echo running: $@
"$@"
echo finished running: $@
exec {FDLOCK}>&-
echo release lock: $@
) &
}
The job of this function is to run a script, or queue it if its already
running but only have one run queued. I want my firewall to run every
time the network changes but there's no benefit to running it twice if
the first run also picks up the network changes from the second change.
But now it doesn't seem to unlock after "finished running". I didn't
think the explicit unlock was required anyway but with or without it's
not doing what I expect. If I use flock -u then it does unlock.
This is what happens with the code exactly as written (or without the
explicit closing of FDLOCK)
=====
First call starts running /etc/firewall/firewall
+/etc/network/scripts/functions:25 $ local lck=/etc/network/lock/firewall
+/etc/network/scripts/functions:26 $ exec
+/etc/network/scripts/functions:27 $ flock -n 10
+/etc/network/scripts/functions:29 $ exec
+/etc/network/scripts/functions:30 $ flock -n 11
+/etc/network/scripts/functions:34 $ exec
+/etc/network/scripts/functions:36 $ echo running: /etc/firewall/firewall
+/etc/network/scripts/functions:37 $ /etc/firewall/firewall
Second call queues a running of /etc/firewall/firewall for when the
first completes.
+/etc/network/scripts/functions:25 $ local lck=/etc/network/lock/firewall
+/etc/network/scripts/functions:26 $ exec
+/etc/network/scripts/functions:27 $ flock -n 10
+/etc/network/scripts/functions:29 $ exec
+/etc/network/scripts/functions:30 $ flock -n 11
+/etc/network/scripts/functions:31 $ echo 'Waiting for /etc/network/lock/firewall.lock to be unlocked'
+/etc/network/scripts/functions:32 $ sleep 10
Subsequent calls are a no-op while a run is queued.
+/etc/network/scripts/functions:25 $ local lck=/etc/network/lock/firewall
+/etc/network/scripts/functions:26 $ exec
+/etc/network/scripts/functions:27 $ flock -n 10
+/etc/network/scripts/functions:27 $ exit 0
(repeats 5 more times)
Queued run checks if first run is completed.
+/etc/network/scripts/functions:30 $ flock -n 11
+/etc/network/scripts/functions:31 $ echo 'Waiting for /etc/network/lock/firewall.lock to be unlocked'
+/etc/network/scripts/functions:32 $ sleep 10
More calls that are a no-op due to the queued run.
+/etc/network/scripts/functions:25 $ local lck=/etc/network/lock/firewall
+/etc/network/scripts/functions:26 $ exec
+/etc/network/scripts/functions:27 $ flock -n 10
+/etc/network/scripts/functions:27 $ exit 0
(repeats 3 more times)
Second check if the first run is completed.
+/etc/network/scripts/functions:30 $ flock -n 11
+/etc/network/scripts/functions:31 $ echo 'Waiting for /etc/network/lock/firewall.lock to be unlocked'
+/etc/network/scripts/functions:32 $ sleep 10
Yet more calls that are a no-op due to the queued run
+/etc/network/scripts/functions:25 $ local lck=/etc/network/lock/firewall
+/etc/network/scripts/functions:26 $ exec
+/etc/network/scripts/functions:27 $ flock -n 10
+/etc/network/scripts/functions:27 $ exit 0
(repeats one more time)
Third check if the first run is completed.
+/etc/network/scripts/functions:30 $ flock -n 11
+/etc/network/scripts/functions:31 $ echo 'Waiting for /etc/network/lock/firewall.lock to be unlocked'
+/etc/network/scripts/functions:32 $ sleep 10
First run completes and lock is released...
+/etc/network/scripts/functions:38 $ echo finished running: /etc/firewall/firewall
+/etc/network/scripts/functions:40 $ exec
+/etc/network/scripts/functions:41 $ echo release lock: /etc/firewall/firewall
another 7 checks if the first run has completed!!!!
+/etc/network/scripts/functions:30 $ flock -n 11
+/etc/network/scripts/functions:31 $ echo 'Waiting for /etc/network/lock/firewall.lock to be unlocked'
+/etc/network/scripts/functions:32 $ sleep 10
(repeats 7 more times)
Finally it acquires the lock and the second run starts.
+/etc/network/scripts/functions:30 $ flock -n 11
+/etc/network/scripts/functions:34 $ exec
+/etc/network/scripts/functions:36 $ echo running: /etc/firewall/firewall
+/etc/network/scripts/functions:37 $ /etc/firewall/firewall
+/etc/network/scripts/functions:38 $ echo finished running: /etc/firewall/firewall
+/etc/network/scripts/functions:40 $ exec
+/etc/network/scripts/functions:41 $ echo release lock: /etc/firewall/firewall
=======
When using flock -u instead it's basically the same up until the first
"echo finished running: /etc/firewall/firewall" where it now does:
+/etc/network/scripts/functions:38 $ echo finished running: /etc/firewall/firewall
+/etc/network/scripts/functions:40 $ flock -u 11
+/etc/network/scripts/functions:41 $ echo release lock: /etc/firewall/firewall
No more looping for 70s waiting for the lock to be released.
+/etc/network/scripts/functions:30 $ flock -n 11
+/etc/network/scripts/functions:34 $ exec
+/etc/network/scripts/functions:36 $ echo running: /etc/firewall/firewall
+/etc/network/scripts/functions:37 $ /etc/firewall/firewall
+/etc/network/scripts/functions:38 $ echo finished running: /etc/firewall/firewall
+/etc/network/scripts/functions:40 $ flock -u 11
+/etc/network/scripts/functions:41 $ echo release lock: /etc/firewall/firewall
======
I vaguely understand what is happening here. /etc/firewall/firewall returns but
continues running in the background for about another minute. For some
reason I don't understand, the flock of fd 11 isn't being released until
this has completed even though I explicitly close the fd.
Can anyone explain this to me? Obviously, I have the fix but I don't
understand why my first attempt didn't work the way I wanted.
Reply to: