[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Some confusion with flock.



I'm converting a (bash) script from using lockfiles to using flock but I'm getting a strange effect that I don't understand.

FWIW, the old script used mkdir where I call flock and rmdir where I close the fd. (I also used a trap to avoid stale lockfiles.

queue_cmd()
{
  (
    local lck=/etc/network/lock/${1##*/}
    exec {FDQUEUE}>$lck.queue
    flock -n ${FDQUEUE} || exit 0

    exec {FDLOCK}>$lck.lock
    while ! flock -n ${FDLOCK}; do
        echo "Waiting for $lck.lock to be unlocked"
        sleep 10
    done
    exec {FDQUEUE}>&-

    echo running: $@
    "$@"
    echo finished running: $@

    exec {FDLOCK}>&-
    echo release lock: $@
  ) &
}

The job of this function is to run a script, or queue it if its already running but only have one run queued. I want my firewall to run every time the network changes but there's no benefit to running it twice if the first run also picks up the network changes from the second change.

But now it doesn't seem to unlock after "finished running". I didn't think the explicit unlock was required anyway but with or without it's not doing what I expect. If I use flock -u then it does unlock.

This is what happens with the code exactly as written (or without the explicit closing of FDLOCK)

=====

First call starts running /etc/firewall/firewall
+/etc/network/scripts/functions:25 $ local lck=/etc/network/lock/firewall
+/etc/network/scripts/functions:26 $ exec
+/etc/network/scripts/functions:27 $ flock -n 10
+/etc/network/scripts/functions:29 $ exec
+/etc/network/scripts/functions:30 $ flock -n 11
+/etc/network/scripts/functions:34 $ exec
+/etc/network/scripts/functions:36 $ echo running: /etc/firewall/firewall
+/etc/network/scripts/functions:37 $ /etc/firewall/firewall

Second call queues a running of /etc/firewall/firewall for when the first completes.
+/etc/network/scripts/functions:25 $ local lck=/etc/network/lock/firewall
+/etc/network/scripts/functions:26 $ exec
+/etc/network/scripts/functions:27 $ flock -n 10
+/etc/network/scripts/functions:29 $ exec
+/etc/network/scripts/functions:30 $ flock -n 11
+/etc/network/scripts/functions:31 $ echo 'Waiting for /etc/network/lock/firewall.lock to be unlocked'
+/etc/network/scripts/functions:32 $ sleep 10

Subsequent calls are a no-op while a run is queued.
+/etc/network/scripts/functions:25 $ local lck=/etc/network/lock/firewall
+/etc/network/scripts/functions:26 $ exec
+/etc/network/scripts/functions:27 $ flock -n 10
+/etc/network/scripts/functions:27 $ exit 0
(repeats 5 more times)

Queued run checks if first run is completed.
+/etc/network/scripts/functions:30 $ flock -n 11
+/etc/network/scripts/functions:31 $ echo 'Waiting for /etc/network/lock/firewall.lock to be unlocked'
+/etc/network/scripts/functions:32 $ sleep 10

More calls that are a no-op due to the queued run.
+/etc/network/scripts/functions:25 $ local lck=/etc/network/lock/firewall
+/etc/network/scripts/functions:26 $ exec
+/etc/network/scripts/functions:27 $ flock -n 10
+/etc/network/scripts/functions:27 $ exit 0
(repeats 3 more times)

Second check if the first run is completed.
+/etc/network/scripts/functions:30 $ flock -n 11
+/etc/network/scripts/functions:31 $ echo 'Waiting for /etc/network/lock/firewall.lock to be unlocked'
+/etc/network/scripts/functions:32 $ sleep 10

Yet more calls that are a no-op due to the queued run
+/etc/network/scripts/functions:25 $ local lck=/etc/network/lock/firewall
+/etc/network/scripts/functions:26 $ exec
+/etc/network/scripts/functions:27 $ flock -n 10
+/etc/network/scripts/functions:27 $ exit 0
(repeats one more time)

Third check if the first run is completed.
+/etc/network/scripts/functions:30 $ flock -n 11
+/etc/network/scripts/functions:31 $ echo 'Waiting for /etc/network/lock/firewall.lock to be unlocked'
+/etc/network/scripts/functions:32 $ sleep 10

First run completes and lock is released...
+/etc/network/scripts/functions:38 $ echo finished running: /etc/firewall/firewall
+/etc/network/scripts/functions:40 $ exec
+/etc/network/scripts/functions:41 $ echo release lock: /etc/firewall/firewall

another 7 checks if the first run has completed!!!!
+/etc/network/scripts/functions:30 $ flock -n 11
+/etc/network/scripts/functions:31 $ echo 'Waiting for /etc/network/lock/firewall.lock to be unlocked'
+/etc/network/scripts/functions:32 $ sleep 10
(repeats 7 more times)

Finally it acquires the lock and the second run starts.
+/etc/network/scripts/functions:30 $ flock -n 11
+/etc/network/scripts/functions:34 $ exec
+/etc/network/scripts/functions:36 $ echo running: /etc/firewall/firewall
+/etc/network/scripts/functions:37 $ /etc/firewall/firewall
+/etc/network/scripts/functions:38 $ echo finished running: /etc/firewall/firewall
+/etc/network/scripts/functions:40 $ exec
+/etc/network/scripts/functions:41 $ echo release lock: /etc/firewall/firewall

=======

When using flock -u instead it's basically the same up until the first "echo finished running: /etc/firewall/firewall" where it now does:

+/etc/network/scripts/functions:38 $ echo finished running: /etc/firewall/firewall
+/etc/network/scripts/functions:40 $ flock -u 11
+/etc/network/scripts/functions:41 $ echo release lock: /etc/firewall/firewall

No more looping for 70s waiting for the lock to be released.
+/etc/network/scripts/functions:30 $ flock -n 11
+/etc/network/scripts/functions:34 $ exec
+/etc/network/scripts/functions:36 $ echo running: /etc/firewall/firewall
+/etc/network/scripts/functions:37 $ /etc/firewall/firewall
+/etc/network/scripts/functions:38 $ echo finished running: /etc/firewall/firewall
+/etc/network/scripts/functions:40 $ flock -u 11
+/etc/network/scripts/functions:41 $ echo release lock: /etc/firewall/firewall

======

I vaguely understand what is happening here. /etc/firewall/firewall returns but continues running in the background for about another minute. For some reason I don't understand, the flock of fd 11 isn't being released until this has completed even though I explicitly close the fd.


Can anyone explain this to me? Obviously, I have the fix but I don't understand why my first attempt didn't work the way I wanted.



Reply to: