hurd init, translator upgrade

To: debian-hurd@lists.debian.org
Subject: hurd init, translator upgrade
From: Roland McGrath <roland@frob.com>
Date: Wed, 19 Aug 1998 19:24:10 -0400
Message-id: <[🔎] 199808192324.TAA05289@baalperazim.frob.com>
I just now got on the mailing list, and I'm catching up from the archives,
so answering questions from old threads.

update: There should be no `update' daemon at all on Hurd systems.  The
filesystems themselves do it, and that is already turned on by default.
(Calling `sync()' periodically as on Unix works fine, but there is no need
for it.)  The built-in periodic sync'ing feature is controlled via the
--sync=SECONDS switch to the filesystem translator; this can be changed in
a running filesystem with the `fsysopts' program, and changed permanently
by reinstalling a passive translator with the switch in the arguments using
`settrans' or for the boot-up root filesystem by editting the command line
in the boot script `/boot/servers.boot'.

init: We do not plan to change to a sysv-style init any time soon, and hope
to ultimately find a different solution that we like better.  The Hurd init
does indeed have various Hurd-magical properties, and for the time being we
expect to just keep it as it is (BSD-style).  That does not mean we cannot
support things like package-installed startup scripts, just that we won't
necessarily do it by going whole-hog to a sysv init setup.  A lot of change
can be made in the /sbin/rc script (daemons/rc.sh in the hurd source tree),
which is now very skeletal.

installing new servers/translators: There are a few issues here.  First,
some servers are special cases and the example being used (exec) is at
least a case with a caveat.

You should be able to *install* new binaries of any and all servers on a
running system without anything funny happening.  That is, if you want to
leave whatever is running at the moment intact and only affect the next
reboot or translator startup or whatever, then that should be fine.
Moreover, you should not need to do anything special before/while
installing the new binaries regardless of what restarts you want to take
place; you just may need to do more magic after everything is installed.

A few servers are essential and can't be restarted in a running system.
For the foreseeable future, to upgrade to a new /hurd/proc or /hurd/auth or
the server program used for the root filesystem (e.g. /hurd/ext2fs.static),
you unavoidably must reboot the system for the new programs to be used.
(Again, it should be no problem to install the new binaries; they just
won't be used until you reboot.)

The example people were using, the exec server, is in theory a translator
like any other and upgrade-able as I'll describe below.  But because it is
an essential server in executing any program (including the new /hurd/exec
binary you are upgrading to), there is special internal magic in dealing
with it.  We haven't debugged that magic much, and I would be frankly quite
surprised if it worked right now to change the /servers/exec translator
without hosing the system.  So don't be surprised if it doesn't actually
follow the rules I describe below, and it's not a good example to use of a
generic translator needing restart after upgrade.  (We will eventually make
this work smoothly, but it's not a priority.)

Now, about passive translators.  First, recall that a passive translator is
stored as a program name and arguments (strings), so there is no inherent
need to reinstall a passive translator setting just because the program
binary has been reinstalled.  

Second, for some translators installed in standard locations, the arguments
in the passive translator setting should be considered an element of
user/sysadmin configuration.  For example, the /servers/socket/2 or
/servers/socket/inet node's passive translator is /hurd/pfinet, the IP
network server; the arguments in the passive translator setting configure
things like network interfaces and addresses.  Though this particular
example will most likely change in the future because this is not the ideal
way to configure the network, the principle holds and it is in general true
that for some servers the primary method of configuration will be to change
the passive translator arguments.  Ideally, whatever the package system
does to handle upgrading configuration files the user has modified should
be applied to passive translator settings too, and the "tell me what's
changed from the package contents" commands should compare passive
translator settings.  The command `showtrans NODE' (e.g. `showtrans
/servers/socket/inet') shows the current setting of a passive translator,
if a script needs to compare or something.  Note that the practice of
renaming the old config file to `FILE.oldconfig' or something and
installing a fresh `FILE' from the package will work in the case of
translated nodes, but might be confusing.  To see their old options the
user would have to do `showtrans NODE.oldconfig', and attempting to open
`NODE.oldconfig' would result in starting the translator up with the old
options (if that even works with the new version of the program).

The only thing you should ever need to use to set up a translated node is
settrans.  The -c (--create) option to settrans tells it to create the node
if it doesn't already exist, so `settrans -c /servers/foo /hurd/foo' is the
simplest way to set up a passive translator the very first time.  Just
setting the passive translator does not cause the server to be immediately
run, and when it does run, it has no stderr or stdout on which to complain
about errors in the arguments or anything; programs using the translated
node will just get "Translator died" errors (EDIED) or strange lossage.  So
if you want to ensure that the new translator program and arguments work
and diagnose errors so the installer can see them, you need to set the
active translator using the -a (--active) option to settrans.  (When an
active translator is started up from the passive translator setting in the
normal way, its surroundings will be slightly different and so there might
be an obscure failure this doesn't detect, but it's still a good thing to
test.)  You can set both the active and passive translator at once with
`settrans -ap /servers/foo /hurd/foo args', and settrans will start up the
program passing down its own stdout and stderr so that you can see any
messages the program writes out--only after the program starts up
successfully and begins the translator handshake will settrans install it
as the active and passive translator, so if there is a problem like bad
argument syntax, settrans will report the error and nothing will be changed
on disk or in the running system.  (Just to test argument syntax, you can
usually run the translator program as a normal shell command and it will
parse its arguments and complain if they are improper before saying "Must
be started as a translator" and exitting without doing anything.)

Leaving aside the advice I've just given for getting error messages, when
setting the passive translator there is the question of what to do about
the running active translator.  settrans lets you choose any of the
possibilities:

1. By default, settrans will refuse to change the passive translator if
   there is an active translator already running:
	methedrine 15 % settrans -cp /tmp/foo /hurd/null
	methedrine 16 % ls -l /tmp/foo
	crw-rw-rw-   1 roland   wheel      0,   0 Aug 19 17:07 /tmp/foo
	methedrine 17 % settrans -cp /tmp/foo /hurd/null
	settrans: /tmp/foo: Device or resource busy
	[Exit 5]
   The best way I can think of off hand to just ascertain if an active
   translator is running would be `settrans -ax /tmp/foo'; that will
   produce the same error above if there is an active translator running,
   and otherwise will exit with 0 and have no effect on the node.  (This
   will only work for the owner of the node, or root.  Future hurd-specific
   flags to ls will be able to show you the st_mode flag bit S_IATRANS on
   the underlying node that indicates this, and any user can do on any node
   he can look up; this requires an O_NOTRANS stat, i.e. something
   hurd-specific that a normal stat or lstat does not get you.)  Just about
   anything you do with the node, aside from settrans and showtrans, will
   start up the active translator, so if you stat the node before you
   install or something, it will probably already be running.

2. The -k (--keep-active) option to settrans tells it to leave the existing
   active translator running while changing the passive translator.  (This
   does not start the active translator up if it was not already running.)
   This means that the old server program will continue to run, and the new
   program will not run until the system is rebooted or the running program
   dies for some reason.  If you wanted to install a new setup for the next
   reboot but leave the running system unchanged, this is in theory the way
   to do it.  Two caveats for that: if the old translator program was not
   already up and running then you need to make it start up before you
   install the new program; just any normal access to the translated node
   (e.g. `ls -l NODE') will use the existing passive translator setting to
   start up the active translator--but remember that a passive translator
   program is stored by file name only, so you must do this while the old
   translator program binary (e.g. /hurd/foo) is still installed.  Also,
   many translators have the behavior that if they have no clients for a
   while (a few minutes) they will just exit of their own volition (this
   saves having a process around doing nothing for long periods; the
   assumption being that the same program will be restarted from the
   passive translator setting next time it is needed); this behavior is not
   currently configurable, but we will probably remedy that, at which time
   some command like `fsysopts NODE --idle-timeout=0' would tweak an active
   translator so that it would never decide to exit by itself (that,
   incidentally, would suffice to start the active translator up if it
   hadn't already been running).  You might want to do that on existing
   translated nodes if you want to have the old system running unchanged
   until a reboot or explicit action to kill the old translator processes.

3. The -g (--goaway) option to settrans tells it to make the existing
   active translator go away while setting a new passive and/or active
   translator.  There are several further options affecting exactly what
   "make it go away" means when you're using -g:

   a. By default (just -g), it is a friendly request to the active
      translator that it should exit now if it likes that idea.  It can
      refuse by returning an error to the request, and usually that error
      is EBUSY (Device or resource busy).  Note that for translators
      providing only a single file rather than a directory (all the
      translators normally installed in /dev in /servers are such cases),
      that a normal unlink (rm) of the translated node has the effect of
      making this same "friendly request" to the running active translator;
      if the translator refuses to go away, the unlink fails with the error
      it gives (usually EBUSY).

      Most servers will refuse to go away if they have any clients that
      would lose their connections to the server.  For a filesystem server,
      if any process has any open files or its working directory or root
      directory in that filesystem, those are such client connections and
      the server will return EBUSY to the goaway request.  For the socket
      servers (/hurd/pf*), if any process has any open sockets at all
      (including listening sockets), those are such client connections.

   b. The -f (--force) option to settrans makes it a stern request to go
      away no matter what.  (The server can in fact still return an error
      if it chooses to; there is no way to forcefully detach an active
      translator if it is uncooperative, short of finding the process with
      ps and killing it.  We should probably fix that.)  Servers receiving
      this request will quickly die, abandoning their current users.
      Usually this will hose any user programs that were talking to the
      server, though they can recover if they are prepared to.  Abandoned
      clients may have an operation aborted in a strange way, and further
      attempts to use old connections to the dead server will usually
      produce a SIGLOST signal (more or less equivalent to SIGPIPE).

      For a filesystem server, open files or directories in that filesystem
      will cease to work; if it's some process's root or working directory,
      that process will probably be hopelessly screwed.

      For a network server (e.g. /hurd/pfinet), all network connections
      will be lost just as in a reboot.  All open sockets (even listening
      sockets) will become invalid and using those file descriptors in any
      call other than `close' will produce SIGLOST or an error.  A network
      daemon prepared for this possibility could recreate its sockets by
      closing the old file descriptors and making fresh `socket' calls,
      etc., but all state in the network stack is gone.

      Similarly for other socket servers.  For the local-domain (aka unix
      domain) socket server (/hurd/pflocal), all open pipes will be lost
      just like network connections; new `pipe' calls will work fine.

      A single-file filesystem such as a device file (/dev/*) is the same:
      any open file descriptors to it will become invalid, and it will lose
      all state (e.g. stty settings for /hurd/term).

      In short, a --goaway --force resetting of the translator providing a
      critical system service is likely to require at least restarting
      everything from single-user mode, if not a reboot.

   The goaway request under -g is part of the operation to set the
   translator, and it never has "partial success": if the active translator
   does not go away, then neither the passive translator setting on disk
   nor the active translator attached in the running system changes; the
   old settings remain in place and the active translator keeps running
   normally.

   Independent of -f (whether the active translator will abandon existing
   clients or refuse to go away), there are two other orthogonal flags that
   affect --goaway:

   aa. The -S (--nosync) option tells the filesystem to die without
       sync'ing pending writes (like halt -n).  (Note this is independent of
       -f, so just -S will not make the filesystem go away if it has
       clients; you need -Sf for "go away right now and do nothing else".)
       This basically exists just for `halt -n' (when the disk is on fire),
       and I can't imagine wanting to use it in package installation/upgrade.

   bb. The -R (--recursive) option tells the active translator that if it
       is providing a filesystem and somewhere in there other active
       translators are running on top of the nodes it provides (we call
       these "child filesystems"), then they should get a request to go
       away too.  The child filesystems get the exact same request with the
       same flags, so they recurse on to their children too; if any child
       filesystem returns an error to the goaway call, then the parent
       filesystem propagates that error code (almost always EBUSY) and
       refuses to go away itself.  If there are any active translators
       running as child filesystems of this translator, and you do not use
       -R, then even if there are no open files in the normal sense, those
       active translators will probably count as live clients and this
       translator will refuse to go away without -f.

       Note that a translator does not need to provide a directory to have
       child filesystems.  A translator providing a single-file filesystem
       can have another translator stacked on top of it (put there with
       `settrans -L').  In such a case, the -R issues above all still apply
       to this single child filesystem (and recursively to any children it
       might have, and its children's children).


I have given a lot of detail, and no recipe for the "right way" to install
translators.  My intent is to clearly describe the translator functionality
and its issues relevant to installation/upgrade and package maintenance.  I
then leave it to the people who have experience with the package system to
devise packaging and installation approaches appropriate to this
functionality.

I hope I have been of some assistance, and I look forward to answering more
questions from Debian folks about packaging issues in the Hurd.


Roland
Reply to:
Follow-Ups:
- Re: hurd init, translator upgrade
  - From: Marcus Brinkmann <Marcus.Brinkmann@ruhr-uni-bochum.de>
Prev by Date: Update on the Base Set?
Next by Date: Re: SMP Hurd
Previous by thread: Re: Update on the Base Set?
Next by thread: Re: hurd init, translator upgrade
Index(es):
- Date
- Thread