[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Bash script problem



On Tue, May 17, 2005 at 02:29:41PM +0200, Markus.Grunwald@pruftechnik.com wrote:
> I have a problem with a bash script. The script (example) is very simple:
> 
> ----script.sh-----------------------
> #!/bin/bash
> 
> echo hello
> ssh PT-AGCMLX1 "while true; do date; sleep 10s; done" 
> ------------------------------------
> 
> When I start script.sh, look up its pid via ps and kill it, the ssh keeps 
> running:
> 
>  9311 pts/4    S+     0:00 /bin/bash ./script.sh
>  9312 pts/4    S+     0:00 ssh PT-AGCMLX1 while true; do date; sleep 10s; 
> done
> 
> > kill 9311
> 
>  9312 pts/4    S      0:00 ssh PT-AGCMLX1 while true; do date; sleep 10s; 
> done
> 
> How can I change my script so that it kills all its child processes, if it 
> is killed itself ? I tried to use the "trap" function of bash, but it 
> never used the correct pid...

You probably want to kill the process _group_, as follows:

$ /bin/kill -- -9311

Specifying the respective PID as a negative number tells kill to
interpret it as a process group  (for this to work, you also need
the '--', to prevent the minus sign from mistakenly being parsed as an
option identifier).  Also, you have to use the real /bin/kill, not the
respective bash builtin.

You could, of course, put this in script.sh itself, by trapping EXIT:

#!/bin/bash
trap "/bin/kill -- -$$" EXIT
...

(The process group will be equal to $$, here.)
Then, a normal "kill 9311" should do, because now, upon exit, the shell
will kill its own process group.

Alternatively, simply use

$ killall -g script.sh


So far, so good.  Unfortunately, there's still one problem remaining.
The command which is run on the remote machine will not be terminated:
Just to illustrate, after having started script.sh, you'd see something
like (I'm simply using localhost in place of PT-AGCMLX)

$ ps axf -o pid,ppid,pgrp,session,cmd
  PID  PPID  PGRP  SESS CMD
...
19468 28299 19468 28299          |       \_ /bin/bash ./script.sh
19469 19468 19468 28299          |           \_ ssh localhost while true; do date; sleep 10s; done
...
19472 19470   257   257      \_ /usr/sbin/sshd
19475 19472 19475 19475          \_ bash -c while true; do date; sleep 10s; done
19507 19475 19475 19475              \_ sleep 10s
...

After having killed the process group 19468, the remote-side processes
would still be there -- apparently they're not being killed, when sshd
(the forked sshd process handling that command, that is) terminates:

...
19475     1 19475 19475 bash -c while true; do date; sleep 10s; done
19567 19475 19475 19475  \_ sleep 10s
...

I can't offer any simple solution for this :( -- this is something I'd
be interested in myself.  So, if anyone should know that magic formula,
which would make sshd kill its children, please let us know...


> On Tue, May 17, 2005 at 01:29:40PM +0200, Dennis Stosberg wrote:
> > 
> > Have you tried to use "exec ssh PT-..." instead? 
> 
> No, I didn't know that. It's interesting, but unfortunately I can't use 
> that. I need to be able to kill the script via "killall script.sh" and 
> after the exec, script.sh isn't there anymore. I could use 'exec -a 
> script.sh', but the killall command won't work either...

killall relies on /proc/<PID>/stat to find the appropriate processes,
but exec -a only modifies /proc/<PID>/cmdline.
In case you really need to use the name "script.sh" for killall, you
might want to create some link "ln -s /usr/bin/ssh script.sh", and then
"exec script.sh ..." (of course, the former script.sh would then have
to have a different name...).

Cheers,
Almut



Reply to: