[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Re: ITP: hatools -- The halockrun program provides a simple way to implement locking in shell scripts.


> | On Jun 18, Igshaan Mesias <igshaan.mesias@gmail.com> wrote:
> | 
> | >   Description     : The halockrun program provides a simple way to
> | > implement locking in shell scripts.
> | 
> | What does this offer over lockfile (procmail package) or dotlockfile
> | (liblockfile1 package)?
Aren't those used particularly for mailbox locking? The hatools package
provides a wider application set for locking files.

> Or even flock(1), which is part of util-linux.

The major benefit halockrun has over these tools is that you can't have
stale lock files with its implementation.

It may seems as though a lot of the parameters for procmails lockfile
are just different timeouts to properly handle stale lockfiles.

Since the concept of hatimerun comes from the high-availabity context,
this aspect is very important. The idea (and also the name) of halockrun
as well as for hatimerun comes from the SUN Cluster high availability
product, the implementation also takes reliability very serious.
If you look at dotlockfile manpage, it SEEMS to implement it in a
similar way, but i there's not many timeout parameters, but still seems
to rely on the existence of files, while halockrun actually lock's the
files by kernel functions.

halockrun also works on NFS shares if lockd is running. The node which
hosts the halockrun instance which holds the lock will also take care
not to stale the lock (the kernel again, not the user space). without
having done any deeper checks, ITUMO more robust that dotlockrun.

To point out again I think the strength of halockrun is in its
implementation is that the lock-cleanup is done by the kernel on process
end (no matter how the process ended, it might have had a core-dump) and
not by a user-space procedure. This makes stale locks impossible.

It might appear that both (lockfile and dotlockfile) are aimd for short
living locks  the dotlockfile manpage does not mention anything how it
handles stale locks, but the lockfile_create(3) manpage does. it says
that it might consider a lockfile beeing stale after five minutes. i did
not see how dotlockfile would allow a longer timeout, nor do i know if
dotlockfile uses lockfile_touch() as described in the lockfile_create

halockrun was implemented in need to prevent multiple cronjobs running
concurrently. e.g. we had a cronjobs which runns ever 5 minutes, and
_usually_ finishes in less then 5 minutes. if now, we had two jobs
running, doing the same, and each was causing the other be become slower
(more resources required). This again has decreased the chance that the
jobs finish until the next instance will be started by cron. This was
the first application of halockrun. i know some cases where this is even
used for longer running cron-jobs. I unsure whether lockfile or
dotlockfile are suitable tools to use for longer running processes. That
might not complete within an hour.

Further are people using changed start/stop scripts for server processes
like apache or mysqld which are based on halockrun. They start the
process with halockrun, and can check if it still running with halockrun
-t. if they want to send a signal to that process they can also use
halockrun -t to obtain the pid and send a signal (e.g. to stop it
again). Again, this implementation is stale-aware and the lock remains
valid as long as the process is running.

IMHO, the applications of halockrun are also wider then those of the
other two tools mentioned.

Also, lockfile and dotlockfile do not have funtionalilty of hatimerun.
hatimerun was initially required for the cron-job problem. So, If the
job which runs every 5 minutes might take sometimes up to 10 minutes,
there is definitely something wrong if it takes an hour. For that reason
hatimerun was built to kill such processes. hatimerun has the abillity
to send multiple signals, to first ask the process himself to quit, or
later kill it forcefully. In an environment with countless cronjobs,
where sometimes some job just hangs, the hatimerun can make an automatic
recover possible. heh, The importance of hatimerun comes with the fact
that halockrun's locks are not considered stale as long as the process
is not running. Therefore a hanging process could block everything, so
that a reliable timeout is a must. lockfile & dotlockfile also seem to
implement the concept of timeouts, this might reduce the need for a
similar mechanism. howerver, halockrun & hatimerun together make also
sure the the process which belongs to a stale lockfile is killed
(cleaned up) so that other resources occupied by this process are also

Igshaan Mesias

Reply to: