[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Questions about SIGHUP behavior



On 11/12/2013 07:35 AM, Steffen Dettmer wrote:
Debian 7.2 with /bin/bash as login shell (via /etc/passwd), shopt
huponexit off (as by default), bash run via SSH from other host.

When closing shell with CTRL-D, "sleep &" continues to run. I had
expected I had to use nohup, setsid, disown or a combination of them
in order to keep background jobs running after ending a shell session.

Short answer: I doubt that this ever worked as you think it did; if you're using a shell with job control and run programs in the background, the shell needs to deliver the HUP signal, which can happen in one of two ways in bash: huponexit on; or SIGHUP delivered to bash.

Long answer:

SIGHUP is not necessarily sent by the shell to background processes when it exits, but more often by the controlling tty's driver or line discipline, which on most Unixes (Linux included) is sadly a morass of cruft with multiple APIs that evolved separately and later merged.

Back in the bad old days, when one used a "real" terminal on RS232 and turned off the terminal, or logged in via a modem connected to the system via RS232 and "hung up" the phone, the DSR line would fall and both foreground and background processes would get SIGHUP (hang up) from the tty driver, because it was the "controlling tty" (a concept that still exists today even though real terminals are almost extinct). Keep in mind, unless one was using the C shell, this was before "job control".

Fast-forward to today: bash by default uses job control except when executing a script, and in the case of SSH, a pseudo-tty is used to simulate the "real" device and its driver [details at pty(7)].

If you have a read of setpgid(2) [also of interest tty_ioctl(4)], you'll see that (basically) on hangup of the tty device a SIGHUP is delivered to the "foreground process group of the controlling terminal". Without defining the "foreground process group" too carefully, suffice it to say that processes can be put in or out of it via system calls like setpgid(2) by the shell, various "daemon starting" programs, or themselves. More important, we can easily see which processes are in it by looking at the pgid and tpgid columns of ps(1)'s output.

For the final piece of the puzzle, check the relevant section of bash(1):
"The shell exits by default upon receipt of a SIGHUP. Before exiting, an interactive shell resends the SIGHUP to all jobs [...] If the huponexit shell option has been set with shopt, bash sends a SIGHUP to all jobs when an interactive login shell exits." [An 'interactive' shell means (basically) one that is running on a tty rather than reading a script from a file.]

A little investigation with ps will show see why your sleep process didn't receive a SIGHUP: when job control is enabled, bash moves background jobs out of the foreground process group; they therefore won't receive a SIGHUP from the tty driver, and since (a) you are exiting bash via EOF and (b) you don't have huponexit set, bash doesn't send it to them. Note that had bash exited due to receiving a SIGHUP *itself* (which would happen e.g. if sshd died and released the pty), it would have delivered the SIGHUP to all of its jobs, foreground and background, which is one reason why you want to use commands like nohup, disown, etc. if you want to really be sure that your background commands continue to run even after you logout.

The following session log demonstrates all of this. I use 'sleep 1h' and 'sleep 2h' to make clearer in the output of 'ps' which command was run by 'nohup' (but notice also the 'ignored' column).


~ % # ===>
~ % # ===> First, let's run some commands in the background with job-control enabled.
~ % # ===>
~ % ssh localhost
  ...
~ % sleep 1h &
[1] 32187
~ % nohup sleep 2h &
[2] 32204
~ % nohup: ignoring input and appending output to `nohup.out'

~ % ps -o pid,ppid,pgid,sess,tpgid,tty,ignored,args
  PID  PPID  PGID  SESS TPGID TT       IGNORED COMMAND
31723 31722 31723 31723 32281 pts/21  00384004 -bash
32187 31723 32187 31723 32281 pts/21  00000000 sleep 1h
32204 31723 32204 31723 32281 pts/21  00000001 sleep 2h
32281 31723 32281 31723 32281 pts/21  00000000 ps -o pid,ppid,

~ % #Notice ^^^^^ that the jobs have different PGID's from bash and the TPGID.

~ % exit
logout
Connection to localhost closed.

~ % # Check that both 'sleep' processes are still running:
~ % ps -eo pid,ppid,pgid,sess,tpgid,tty,ignored,args |grep sleep |grep -v grep
32187     1 32187 31723    -1 ?   0000000000000000 sleep 1h
32204     1 32204 31723    -1 ?   0000000000000001 sleep 2h
~ %
~ % # Demonstrate the effects of nohup on one of them:
~ % kill -HUP 32187 32204
~ % ps -eo pid,ppid,pgid,sess,tpgid,tty,ignored,args |grep sleep |grep -v grep
32204     1 32204 31723    -1 ?   0000000000000001 sleep 2h
~ %
~ % # OK, now kill it too:
~ % kill -TERM 32204
~ % ps -eo pid,ppid,pgid,sess,tpgid,tty,ignored,args |grep sleep |grep -v grep
~ %


~ % # ===>
~ % # ===> Try the same thing again, with job control disabled.
~ % # ===>
~ % ssh localhost
  ...
~ % set +m
~ % sleep 1h &
[1] 677
~ % nohup sleep 2h &
[2] 706
~ % nohup: appending output to `nohup.out'

~ % ps -o pid,ppid,pgid,sess,tpgid,tty,ignored,args
  PID  PPID  PGID  SESS TPGID TT       IGNORED COMMAND
  677 32636 32636 32636 32636 pts/21  00000006 sleep 1h
  706 32636 32636 32636 32636 pts/21  00000007 sleep 2h
  765 32636 32636 32636 32636 pts/21  00000000 ps -o pid,ppid,
32636 32635 32636 32636 32636 pts/21  00384004 -bash

~ % # Notice ^^^^ that this time all processes' PGID are the same, and is the TPGID.

~ % exit
logout
Connection to localhost closed.

~ % # Now only the nohup-protected process remains:
~ % ps -eo pid,ppid,pgid,sess,tpgid,tty,ignored,args |grep sleep |grep -v grep
  706     1 32636 32636    -1 ?   0000000000000007 sleep 2h
~ % kill 706


~ % # ===>
~ % # ===> One more time, *with* job-control, but terminate bash via SIGHUP
~ % # ===> (simulating a turned-off terminal, lost network connection, etc.)
~ % # ===>
~ % ssh localhost
  [...]
~ % sleep 1h &
[1] 4643
~ % nohup sleep 2h &
[2] 4644
~ % nohup: ignoring input and appending output to `nohup.out'

~ % ps -o pid,ppid,pgid,sess,tpgid,tty,ignored,args
  PID  PPID  PGID  SESS TPGID TT               IGNORED COMMAND
 4580  4579  4580  4580  4646 pts/21  0000000000384004 -bash
 4643  4580  4643  4580  4646 pts/21  0000000000000000 sleep 1
 4644  4580  4644  4580  4646 pts/21  0000000000000001 sleep 2
 4646  4580  4646  4580  4646 pts/21  0000000000000000 ps -o p

~ % #Separate^^^^ pgid's this time: the tty won't deliver SIGHUP, it's up to the shell

~ % kill -HUP $$
Connection to localhost closed.
~ %
~ % # Non-nohup-protected process got SIGHUP; the other remains:
~ % ps -eo pid,ppid,pgid,sess,tpgid,tty,ignored,args |grep sleep |grep -v grep
 4644     1  4644  4580    -1 ?   0000000000000001 sleep 2h
~ % kill 4644


-- David


Reply to: