hurd init, translator upgrade
I just now got on the mailing list, and I'm catching up from the archives,
so answering questions from old threads.
update: There should be no `update' daemon at all on Hurd systems. The
filesystems themselves do it, and that is already turned on by default.
(Calling `sync()' periodically as on Unix works fine, but there is no need
for it.) The built-in periodic sync'ing feature is controlled via the
--sync=SECONDS switch to the filesystem translator; this can be changed in
a running filesystem with the `fsysopts' program, and changed permanently
by reinstalling a passive translator with the switch in the arguments using
`settrans' or for the boot-up root filesystem by editting the command line
in the boot script `/boot/servers.boot'.
init: We do not plan to change to a sysv-style init any time soon, and hope
to ultimately find a different solution that we like better. The Hurd init
does indeed have various Hurd-magical properties, and for the time being we
expect to just keep it as it is (BSD-style). That does not mean we cannot
support things like package-installed startup scripts, just that we won't
necessarily do it by going whole-hog to a sysv init setup. A lot of change
can be made in the /sbin/rc script (daemons/rc.sh in the hurd source tree),
which is now very skeletal.
installing new servers/translators: There are a few issues here. First,
some servers are special cases and the example being used (exec) is at
least a case with a caveat.
You should be able to *install* new binaries of any and all servers on a
running system without anything funny happening. That is, if you want to
leave whatever is running at the moment intact and only affect the next
reboot or translator startup or whatever, then that should be fine.
Moreover, you should not need to do anything special before/while
installing the new binaries regardless of what restarts you want to take
place; you just may need to do more magic after everything is installed.
A few servers are essential and can't be restarted in a running system.
For the foreseeable future, to upgrade to a new /hurd/proc or /hurd/auth or
the server program used for the root filesystem (e.g. /hurd/ext2fs.static),
you unavoidably must reboot the system for the new programs to be used.
(Again, it should be no problem to install the new binaries; they just
won't be used until you reboot.)
The example people were using, the exec server, is in theory a translator
like any other and upgrade-able as I'll describe below. But because it is
an essential server in executing any program (including the new /hurd/exec
binary you are upgrading to), there is special internal magic in dealing
with it. We haven't debugged that magic much, and I would be frankly quite
surprised if it worked right now to change the /servers/exec translator
without hosing the system. So don't be surprised if it doesn't actually
follow the rules I describe below, and it's not a good example to use of a
generic translator needing restart after upgrade. (We will eventually make
this work smoothly, but it's not a priority.)
Now, about passive translators. First, recall that a passive translator is
stored as a program name and arguments (strings), so there is no inherent
need to reinstall a passive translator setting just because the program
binary has been reinstalled.
Second, for some translators installed in standard locations, the arguments
in the passive translator setting should be considered an element of
user/sysadmin configuration. For example, the /servers/socket/2 or
/servers/socket/inet node's passive translator is /hurd/pfinet, the IP
network server; the arguments in the passive translator setting configure
things like network interfaces and addresses. Though this particular
example will most likely change in the future because this is not the ideal
way to configure the network, the principle holds and it is in general true
that for some servers the primary method of configuration will be to change
the passive translator arguments. Ideally, whatever the package system
does to handle upgrading configuration files the user has modified should
be applied to passive translator settings too, and the "tell me what's
changed from the package contents" commands should compare passive
translator settings. The command `showtrans NODE' (e.g. `showtrans
/servers/socket/inet') shows the current setting of a passive translator,
if a script needs to compare or something. Note that the practice of
renaming the old config file to `FILE.oldconfig' or something and
installing a fresh `FILE' from the package will work in the case of
translated nodes, but might be confusing. To see their old options the
user would have to do `showtrans NODE.oldconfig', and attempting to open
`NODE.oldconfig' would result in starting the translator up with the old
options (if that even works with the new version of the program).
The only thing you should ever need to use to set up a translated node is
settrans. The -c (--create) option to settrans tells it to create the node
if it doesn't already exist, so `settrans -c /servers/foo /hurd/foo' is the
simplest way to set up a passive translator the very first time. Just
setting the passive translator does not cause the server to be immediately
run, and when it does run, it has no stderr or stdout on which to complain
about errors in the arguments or anything; programs using the translated
node will just get "Translator died" errors (EDIED) or strange lossage. So
if you want to ensure that the new translator program and arguments work
and diagnose errors so the installer can see them, you need to set the
active translator using the -a (--active) option to settrans. (When an
active translator is started up from the passive translator setting in the
normal way, its surroundings will be slightly different and so there might
be an obscure failure this doesn't detect, but it's still a good thing to
test.) You can set both the active and passive translator at once with
`settrans -ap /servers/foo /hurd/foo args', and settrans will start up the
program passing down its own stdout and stderr so that you can see any
messages the program writes out--only after the program starts up
successfully and begins the translator handshake will settrans install it
as the active and passive translator, so if there is a problem like bad
argument syntax, settrans will report the error and nothing will be changed
on disk or in the running system. (Just to test argument syntax, you can
usually run the translator program as a normal shell command and it will
parse its arguments and complain if they are improper before saying "Must
be started as a translator" and exitting without doing anything.)
Leaving aside the advice I've just given for getting error messages, when
setting the passive translator there is the question of what to do about
the running active translator. settrans lets you choose any of the
possibilities:
1. By default, settrans will refuse to change the passive translator if
there is an active translator already running:
methedrine 15 % settrans -cp /tmp/foo /hurd/null
methedrine 16 % ls -l /tmp/foo
crw-rw-rw- 1 roland wheel 0, 0 Aug 19 17:07 /tmp/foo
methedrine 17 % settrans -cp /tmp/foo /hurd/null
settrans: /tmp/foo: Device or resource busy
[Exit 5]
The best way I can think of off hand to just ascertain if an active
translator is running would be `settrans -ax /tmp/foo'; that will
produce the same error above if there is an active translator running,
and otherwise will exit with 0 and have no effect on the node. (This
will only work for the owner of the node, or root. Future hurd-specific
flags to ls will be able to show you the st_mode flag bit S_IATRANS on
the underlying node that indicates this, and any user can do on any node
he can look up; this requires an O_NOTRANS stat, i.e. something
hurd-specific that a normal stat or lstat does not get you.) Just about
anything you do with the node, aside from settrans and showtrans, will
start up the active translator, so if you stat the node before you
install or something, it will probably already be running.
2. The -k (--keep-active) option to settrans tells it to leave the existing
active translator running while changing the passive translator. (This
does not start the active translator up if it was not already running.)
This means that the old server program will continue to run, and the new
program will not run until the system is rebooted or the running program
dies for some reason. If you wanted to install a new setup for the next
reboot but leave the running system unchanged, this is in theory the way
to do it. Two caveats for that: if the old translator program was not
already up and running then you need to make it start up before you
install the new program; just any normal access to the translated node
(e.g. `ls -l NODE') will use the existing passive translator setting to
start up the active translator--but remember that a passive translator
program is stored by file name only, so you must do this while the old
translator program binary (e.g. /hurd/foo) is still installed. Also,
many translators have the behavior that if they have no clients for a
while (a few minutes) they will just exit of their own volition (this
saves having a process around doing nothing for long periods; the
assumption being that the same program will be restarted from the
passive translator setting next time it is needed); this behavior is not
currently configurable, but we will probably remedy that, at which time
some command like `fsysopts NODE --idle-timeout=0' would tweak an active
translator so that it would never decide to exit by itself (that,
incidentally, would suffice to start the active translator up if it
hadn't already been running). You might want to do that on existing
translated nodes if you want to have the old system running unchanged
until a reboot or explicit action to kill the old translator processes.
3. The -g (--goaway) option to settrans tells it to make the existing
active translator go away while setting a new passive and/or active
translator. There are several further options affecting exactly what
"make it go away" means when you're using -g:
a. By default (just -g), it is a friendly request to the active
translator that it should exit now if it likes that idea. It can
refuse by returning an error to the request, and usually that error
is EBUSY (Device or resource busy). Note that for translators
providing only a single file rather than a directory (all the
translators normally installed in /dev in /servers are such cases),
that a normal unlink (rm) of the translated node has the effect of
making this same "friendly request" to the running active translator;
if the translator refuses to go away, the unlink fails with the error
it gives (usually EBUSY).
Most servers will refuse to go away if they have any clients that
would lose their connections to the server. For a filesystem server,
if any process has any open files or its working directory or root
directory in that filesystem, those are such client connections and
the server will return EBUSY to the goaway request. For the socket
servers (/hurd/pf*), if any process has any open sockets at all
(including listening sockets), those are such client connections.
b. The -f (--force) option to settrans makes it a stern request to go
away no matter what. (The server can in fact still return an error
if it chooses to; there is no way to forcefully detach an active
translator if it is uncooperative, short of finding the process with
ps and killing it. We should probably fix that.) Servers receiving
this request will quickly die, abandoning their current users.
Usually this will hose any user programs that were talking to the
server, though they can recover if they are prepared to. Abandoned
clients may have an operation aborted in a strange way, and further
attempts to use old connections to the dead server will usually
produce a SIGLOST signal (more or less equivalent to SIGPIPE).
For a filesystem server, open files or directories in that filesystem
will cease to work; if it's some process's root or working directory,
that process will probably be hopelessly screwed.
For a network server (e.g. /hurd/pfinet), all network connections
will be lost just as in a reboot. All open sockets (even listening
sockets) will become invalid and using those file descriptors in any
call other than `close' will produce SIGLOST or an error. A network
daemon prepared for this possibility could recreate its sockets by
closing the old file descriptors and making fresh `socket' calls,
etc., but all state in the network stack is gone.
Similarly for other socket servers. For the local-domain (aka unix
domain) socket server (/hurd/pflocal), all open pipes will be lost
just like network connections; new `pipe' calls will work fine.
A single-file filesystem such as a device file (/dev/*) is the same:
any open file descriptors to it will become invalid, and it will lose
all state (e.g. stty settings for /hurd/term).
In short, a --goaway --force resetting of the translator providing a
critical system service is likely to require at least restarting
everything from single-user mode, if not a reboot.
The goaway request under -g is part of the operation to set the
translator, and it never has "partial success": if the active translator
does not go away, then neither the passive translator setting on disk
nor the active translator attached in the running system changes; the
old settings remain in place and the active translator keeps running
normally.
Independent of -f (whether the active translator will abandon existing
clients or refuse to go away), there are two other orthogonal flags that
affect --goaway:
aa. The -S (--nosync) option tells the filesystem to die without
sync'ing pending writes (like halt -n). (Note this is independent of
-f, so just -S will not make the filesystem go away if it has
clients; you need -Sf for "go away right now and do nothing else".)
This basically exists just for `halt -n' (when the disk is on fire),
and I can't imagine wanting to use it in package installation/upgrade.
bb. The -R (--recursive) option tells the active translator that if it
is providing a filesystem and somewhere in there other active
translators are running on top of the nodes it provides (we call
these "child filesystems"), then they should get a request to go
away too. The child filesystems get the exact same request with the
same flags, so they recurse on to their children too; if any child
filesystem returns an error to the goaway call, then the parent
filesystem propagates that error code (almost always EBUSY) and
refuses to go away itself. If there are any active translators
running as child filesystems of this translator, and you do not use
-R, then even if there are no open files in the normal sense, those
active translators will probably count as live clients and this
translator will refuse to go away without -f.
Note that a translator does not need to provide a directory to have
child filesystems. A translator providing a single-file filesystem
can have another translator stacked on top of it (put there with
`settrans -L'). In such a case, the -R issues above all still apply
to this single child filesystem (and recursively to any children it
might have, and its children's children).
I have given a lot of detail, and no recipe for the "right way" to install
translators. My intent is to clearly describe the translator functionality
and its issues relevant to installation/upgrade and package maintenance. I
then leave it to the people who have experience with the package system to
devise packaging and installation approaches appropriate to this
functionality.
I hope I have been of some assistance, and I look forward to answering more
questions from Debian folks about packaging issues in the Hurd.
Roland
Reply to: