--- Begin Message ---
Package: lsb-base
Version: 3.2-23.2squeeze1
Severity: normal
Tags: patch
The specific problem I'm experiencing is with /etc/init.d/portmap, which
returns 4 when portmap isn't running. I can't find sufficient documentation on
correct behavior to be sure if init-functions is incorrect, or if portmap is
using it incorrectly, but I think it's the former.
This actually causes serious problems at least in managing portmap with
pacemaker, which requires init scripts to comply strictly with the LSB
specification [2]. Pacemaker will call the "status" action periodically to
monitor the service, and if the response is "unknown", the monitor action is
considered to have failed, which might get the node ejected from the cluster,
or at least prevent other things from running as they should.
I think the crux of the issue is the implementation of pidofproc. The LSB
specification [1] says about pidofproc:
"If the -p pidfile option is specified and the named pidfile does not
exist, the functions shall assume that the daemon is not running."
At the end of pidofproc is this:
if [ -x /bin/pidof -a ! "$specified" ]; then
status="0"
/bin/pidof -o %PPID -x $1 || status="$?"
if [ "$status" = 1 ]; then
return 3 # program is not running
fi
return 0
fi
return 4 # Unable to determine status
The way I read this, pidofproc can't return 3 if a pidfile is specified, which
I think is wrong according to the LSB specification. As I read the spec [1], if
the pidfile doesn't exist, and it was explicitly specified, then pidofproc
should return 3. No process table grepping or anything else allowed. In that
spirit, I propose this patch, which at least solves my problem:
--- init-functions.orig 2012-06-13 16:55:02.000000000 -0400
+++ init-functions 2012-06-13 17:02:58.000000000 -0400
@@ -77,6 +77,9 @@
pidfile="/var/run/$base.pid"
fi
+ if [ "$specified" -a -n "${pidfile:-}" -a ! -e "$pidfile" ]; then
+ return 3 # explicitly specified pidfile does not exist; must assume not running
+ fi
if [ -n "${pidfile:-}" -a -r "$pidfile" ]; then
read pid < "$pidfile"
if [ -n "${pid:-}" ]; then
-- System Information:
Debian Release: 6.0.5
APT prefers stable
APT policy: (500, 'stable')
Architecture: amd64 (x86_64)
Kernel: Linux 2.6.32-5-amd64 (SMP w/16 CPU cores)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash
Versions of packages lsb-base depends on:
ii ncurses-bin 5.7+20100313-5 terminal-related programs and man
ii sed 4.2.1-7 The GNU sed stream editor
lsb-base recommends no packages.
lsb-base suggests no packages.
-- no debconf information
[1] http://refspecs.linuxbase.org/LSB_3.1.1/LSB-Core-generic/LSB-Core-generic/iniscrptfunc.html
[2] http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/ap-lsb.html
--- End Message ---
--- Begin Message ---
Version: 3.2-25
Hi Phil, and thanks for your feedback,
Le lundi, 18 juin 2012 19.26:00, Phil Frost a écrit :
> On 06/18/2012 12:21 PM, Didier 'OdyX' Raboud wrote:
> > As you reported this bug against the Debian stable release, lsb-base has
> > seen many updates since then and I suspect that your bug above has been
> > fixed by the resolution of bug #597628 in lsb 3.2-25. Can you verify
> > that any of
> >
> > a) the attached init-functions-a (to be put as /lib/lsb/init-functions)
> > file solves your bug (it's a file from stable + patches up to 3.2-25);
> > b) the attached init-functions-b (to be put as /lib/lsb/init-functions)
> > file solves your bug (it's a file from stable + all patches concerning
> > pidofproc in the current unstable);
> > c) the lsb-base package from the current testing or unstable do so;
> >
> > … solves your issue.
>
> It looks like both init-functions-a and init-functions-b solve my issue.
Hereby marking this bug as fixed in the 3.2-25 version (a scenario above)
> I would say as a minor point that the change introduced in init-functions-a
> is correct in my particular case, but maybe not in others. For example, what
> if the pid file exists, but is not readable? The service could very well be
> running, but init-functions-a will return "not running", if I'm reading it
> correctly. I don't see anything in the specification that says what to do in
> this case, but "unknown" seems like a better answer than "not running" to
> me. I suppose a case could also be made to attempt a guess as if no pidfile
> were available, though personally I'd regard that as DWIM and avoid it.
>
> The deeply nested logic in init-functions-b is hard to read and
> understand, but seems to partially address this. As I read it, it will
> return "unknown" if the pid file exists but is not readable.
I admit init-functions-b is quite tough to read, but I think it does the right
thing; see as rewritten is pseudo-code:
if pidfile name is non-empty # if [ -n "${pidfile:-}" ]; then
if pidfile exists # if [ -e "$pidfile" ]; then
if pidfile is readable # if [ -r "$pidfile" ]; then
DO: read it and return 0 or 1 depending on the state of its content
else
DO: return 4 as pidfile name is non-empty, exists but is not readable,
hence status is unknown.
fi
else
# pidfile doesn't exist, try to find the pid nevertheless using pidof
# If impossible to do, return 3 as without pidfile it's safe to assume it's
# probably stopped.
fi
fi
return 4 as we were unable to determine status
> However, I wonder under what conditions the last conditional could be true:
>
> if [ "$specified" ]; then
> return 3 # almost certain it's not running
> fi
Indeed, it's probably never-used code but I'm not fluent enough in shell-
scripting to get bold enough to remove it.
> I think the answer is only "weird edge cases". Like, if $pidfile='', or
> if the pidfile didn't contain a PID. Under these conditions, pidofproc
> returns "unknown" if the pidfile is unspecified; I don't know if
> explicitly specifying an invalid pidfile makes things any less unknown.
> From my perspective (cluster management) "unknown" is the only correct
> answer here, since in either case something is seriously wrong, and the
> right thing to do is fence the troublesome node. Of course, if you
> pretend nothing is wrong, then the cluster manager can't do that.
Yeah, probably that.
> The other possibility (saying the service isn't running, which actually
> you aren't sure) leads to a situation where actually two mutually
> exclusive resources could be started by the cluster manager. For
> example, the same filesystem on a SAN could be mounted twice, or two
> DRBD nodes could be made primary.
>
> Again, these are minor points. Since I don't actually use LSB init
> scripts for mounting my filesystems or promoting my DRBD nodes, the
> horrible data corruption scenarios I give can't actually exist. Probably
> this is true of most clusters in practice, but still, the possibility
> exists that someone will be surprised by this overly-optimistic behavior
> someday. If you wanted to close this bug I wouldn't object, and I'll
> open another one when I can think of a real-world use case.
That's done, but with version-tracking: the bug is still marked as open in the
current stable release but marked as done from version 3.2-25 on.
Feel free to add more input either way but I think that from lsb-base point of
view, what could be done has been done.
Cheers,
OdyX
--- End Message ---