Re: Bash script problem
On Tue, May 17, 2005 at 02:29:41PM +0200, Markus.Grunwald@pruftechnik.com wrote:
> I have a problem with a bash script. The script (example) is very simple:
>
> ----script.sh-----------------------
> #!/bin/bash
>
> echo hello
> ssh PT-AGCMLX1 "while true; do date; sleep 10s; done"
> ------------------------------------
>
> When I start script.sh, look up its pid via ps and kill it, the ssh keeps
> running:
>
> 9311 pts/4 S+ 0:00 /bin/bash ./script.sh
> 9312 pts/4 S+ 0:00 ssh PT-AGCMLX1 while true; do date; sleep 10s;
> done
>
> > kill 9311
>
> 9312 pts/4 S 0:00 ssh PT-AGCMLX1 while true; do date; sleep 10s;
> done
>
> How can I change my script so that it kills all its child processes, if it
> is killed itself ? I tried to use the "trap" function of bash, but it
> never used the correct pid...
You probably want to kill the process _group_, as follows:
$ /bin/kill -- -9311
Specifying the respective PID as a negative number tells kill to
interpret it as a process group (for this to work, you also need
the '--', to prevent the minus sign from mistakenly being parsed as an
option identifier). Also, you have to use the real /bin/kill, not the
respective bash builtin.
You could, of course, put this in script.sh itself, by trapping EXIT:
#!/bin/bash
trap "/bin/kill -- -$$" EXIT
...
(The process group will be equal to $$, here.)
Then, a normal "kill 9311" should do, because now, upon exit, the shell
will kill its own process group.
Alternatively, simply use
$ killall -g script.sh
So far, so good. Unfortunately, there's still one problem remaining.
The command which is run on the remote machine will not be terminated:
Just to illustrate, after having started script.sh, you'd see something
like (I'm simply using localhost in place of PT-AGCMLX)
$ ps axf -o pid,ppid,pgrp,session,cmd
PID PPID PGRP SESS CMD
...
19468 28299 19468 28299 | \_ /bin/bash ./script.sh
19469 19468 19468 28299 | \_ ssh localhost while true; do date; sleep 10s; done
...
19472 19470 257 257 \_ /usr/sbin/sshd
19475 19472 19475 19475 \_ bash -c while true; do date; sleep 10s; done
19507 19475 19475 19475 \_ sleep 10s
...
After having killed the process group 19468, the remote-side processes
would still be there -- apparently they're not being killed, when sshd
(the forked sshd process handling that command, that is) terminates:
...
19475 1 19475 19475 bash -c while true; do date; sleep 10s; done
19567 19475 19475 19475 \_ sleep 10s
...
I can't offer any simple solution for this :( -- this is something I'd
be interested in myself. So, if anyone should know that magic formula,
which would make sshd kill its children, please let us know...
> On Tue, May 17, 2005 at 01:29:40PM +0200, Dennis Stosberg wrote:
> >
> > Have you tried to use "exec ssh PT-..." instead?
>
> No, I didn't know that. It's interesting, but unfortunately I can't use
> that. I need to be able to kill the script via "killall script.sh" and
> after the exec, script.sh isn't there anymore. I could use 'exec -a
> script.sh', but the killall command won't work either...
killall relies on /proc/<PID>/stat to find the appropriate processes,
but exec -a only modifies /proc/<PID>/cmdline.
In case you really need to use the name "script.sh" for killall, you
might want to create some link "ln -s /usr/bin/ssh script.sh", and then
"exec script.sh ..." (of course, the former script.sh would then have
to have a different name...).
Cheers,
Almut
Reply to: