[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#5268: cron dies nightly



Okay,  more info...

Cron stopped running jobs at 04:01:07.  I had a job running every minute
that just did "date >>/tmp/cron.run".

By using strace to attach to the non-functioning cron, I have learned
the following:

 - Cron is still active.  Every minute it wakes and tries to do something.
 - Whatever it is trying to do is failing.
 - The core of the problem is out network config restarting syslog.


Every minute, strace gives the following:

--- SIGALRM (Alarm clock) ---
sigreturn()                             = ? (mask now [ALRM])
time(NULL)                              = 847462920
sigaction(SIGALRM, {SIG_DFL}, NULL)     = 0
alarm(0)                                = 0
sigprocmask(SIG_SETMASK, [], NULL)      = 0
stat("crontabs", {st_mode=S_IFDIR|0700, st_size=1024, ...}) = 0
stat("/etc/crontab", {st_mode=S_IFREG|0644, st_size=506, ...}) = 0
time(NULL)                              = 847462920
fork()                                  = 25476
[pid 15699] fork()                      = 25477
[pid 15699] time(NULL)                  = 847462920
[pid 15699] sigprocmask(SIG_BLOCK, [ALRM], []) = 0
[pid 15699] sigaction(SIGALRM, {0x40061c60, [], 0}, {SIG_DFL}) = 0
[pid 15699] time(NULL)                  = 847462920
[pid 15699] alarm(60)                   = 0
[pid 15699] sigsuspend([] <unfinished ...>
[pid 25476] close(3)                    = 0
[pid 25477] close(3)                    = 0
[pid 25476] sigaction(SIGCHLD, {SIG_DFL}, {SIG_IGN}) = 0
[pid 25477] sigaction(SIGCHLD, {SIG_DFL}, {SIG_IGN}) = 0
[pid 25476] pipe([3, 5])                = 0
[pid 25476] pipe([6, 7])                = 0
[pid 25476] fork()                      = 25478
[pid 25476] close(3)                    = 0
[pid 25476] close(7)                    = 0
[pid 25476] close(5)                    = 0
[pid 25476] fcntl(6, F_GETFL)           = 0 (flags O_RDONLY)
[pid 25476] fstat(6, {st_mode=S_IFIFO|0600, st_size=0, ...}) = 0
[pid 25476] lseek(6, 0, SEEK_CUR)       = -1 ESPIPE (Illegal seek)
[pid 25476] read(6,  <unfinished ...>
[pid 25477] pipe([3, 5])                = 0
[pid 25477] pipe([6, 7])                = 0
[pid 25477] fork()                      = 25479
[pid 25477] close(3)                    = 0
[pid 25477] close(7)                    = 0
[pid 25477] close(5)                    = 0
[pid 25477] fcntl(6, F_GETFL)           = 0 (flags O_RDONLY)
[pid 25477] fstat(6, {st_mode=S_IFIFO|0600, st_size=0, ...}) = 0
[pid 25477] lseek(6, 0, SEEK_CUR)       = -1 ESPIPE (Illegal seek)
[pid 25477] read(6,  <unfinished ...>
[pid 25478] getpid( <unfinished ...>
[pid 25479] getpid( <unfinished ...>
[pid 25478] <... getpid resumed> )      = 25478
[pid 25479] <... getpid resumed> )      = 25479
[pid 25478] time([847462920])           = 847462920
[pid 25479] time([847462921])           = 847462921
[pid 25478] getpid()                    = 25478
[pid 25478] write(4, "<78>Nov  8 09:22:00 /USR/SBIN/CR"..., 73) = -1 EPIPE (Broken pipe)
[pid 25478] --- SIGPIPE (Broken pipe) ---
[pid 25478] +++ killed by SIGPIPE +++
[pid 25476] <... read resumed> "", 4096) = 0
[pid 25476] --- SIGCHLD (Child exited) ---
[pid 25476] close(6)                    = 0
[pid 25476] wait4(-1, [WIFSIGNALED(s) && WTERMSIG(s) == SIGPIPE], 0, NULL) = 25478
[pid 25476] wait4(-1, 0xbffff674, 0, NULL) = -1 ECHILD (No child processes)
[pid 25476] _exit(0)                    = ?
[pid 15699] --- SIGCHLD (Child exited) ---
[pid 25479] getpid()                    = 25479
[pid 25479] write(4, "<78>Nov  8 09:22:01 /USR/SBIN/CR"..., 81) = -1 EPIPE (Broken pipe)
[pid 25479] --- SIGPIPE (Broken pipe) ---
[pid 25479] +++ killed by SIGPIPE +++
[pid 25477] <... read resumed> "", 4096) = 0
[pid 25477] --- SIGCHLD (Child exited) ---
[pid 25477] close(6)                    = 0
[pid 25477] wait4(-1, [WIFSIGNALED(s) && WTERMSIG(s) == SIGPIPE], 0, NULL) = 25479
[pid 25477] wait4(-1, 0xbffff674, 0, NULL) = -1 ECHILD (No child processes)
[pid 25477] _exit(0)                    = ?
--- SIGCHLD (Child exited) ---


It appears that the forked processes are dying because they cannot write to
the syslog pipe (the "Broken pipe") message.

The following happened when I edited one of the crontabs:

--- SIGALRM (Alarm clock) ---
sigreturn()                             = ? (mask now [ALRM])
time(NULL)                              = 847464420
sigaction(SIGALRM, {SIG_DFL}, NULL)     = 0
alarm(0)                                = 0
sigprocmask(SIG_SETMASK, [], NULL)      = 0
stat("crontabs", {st_mode=S_IFDIR|0700, st_size=1024, ...}) = 0
stat("/etc/crontab", {st_mode=S_IFREG|0644, st_size=506, ...}) = 0
open("/etc/crontab", O_RDONLY)          = 5
fstat(5, {st_mode=S_IFREG|0644, st_size=506, ...}) = 0
close(5)                                = 0
stat("crontabs", {st_mode=S_IFDIR|0700, st_size=1024, ...}) = 0
open("crontabs", O_RDONLY)              = 5
fcntl(5, F_SETFD, FD_CLOEXEC)           = 0
getdents(5, /* 6 entries */, 4096)      = 100
open("/etc/passwd", O_RDONLY)           = 6
fstat(6, {st_mode=S_IFREG|0644, st_size=1001, ...}) = 0
read(6, "root:frkR42moWjAIM:0:0:root:/hom"..., 4096) = 1001
lseek(6, -568, SEEK_CUR)                = 433
close(6)                                = 0
open("crontabs/mail", O_RDONLY)         = 6
fstat(6, {st_mode=S_IFREG|0600, st_size=470, ...}) = 0
close(6)                                = 0
open("/etc/passwd", O_RDONLY)           = 6
fstat(6, {st_mode=S_IFREG|0644, st_size=1001, ...}) = 0
read(6, "root:frkR42moWjAIM:0:0:root:/hom"..., 4096) = 1001
lseek(6, -952, SEEK_CUR)                = 49
close(6)                                = 0
open("crontabs/root", O_RDONLY)         = 6
fstat(6, {st_mode=S_IFREG|0600, st_size=401, ...}) = 0
close(6)                                = 0
open("/etc/passwd", O_RDONLY)           = 6
fstat(6, {st_mode=S_IFREG|0644, st_size=1001, ...}) = 0
read(6, "root:frkR42moWjAIM:0:0:root:/hom"..., 4096) = 1001
lseek(6, -162, SEEK_CUR)                = 839
close(6)                                = 0
open("crontabs/bcwhite", O_RDONLY)      = 6
fstat(6, {st_mode=S_IFREG|0600, st_size=574, ...}) = 0
getpid()                                = 15699
time([847464420])                       = 847464420
getpid()                                = 15699
write(4, "<78>Nov  8 09:47:00 /usr/sbin/cr"..., 80) = -1 EPIPE (Broken pipe)
--- SIGPIPE (Broken pipe) ---
+++ killed by SIGPIPE +++


The same thing happened here, except that since the SIGPIPE was sent to
the main process, cron stopped completely.

Now, why is the pipe broken?  Because syslog is restarted nightly on
our system as part of the automatic network configuration we have.  We
have not changed this part of our network config for some time, so it
would seem that this only became a problem with a later version of cron.
We have removed this restart and hopefully cron will stop dying.

However, it would be nice if cron noticed the SIGPIPE and just reopened
the pipe to syslog instead of dying.

                                          Brian
                                 ( bcwhite@verisim.com )
                                             
-------------------------------------------------------------------------------
  Want to get it together?  We can help!  http://www.verisim.com/coordinator/

--
TO UNSUBSCRIBE FROM THIS MAILING LIST: e-mail the word "unsubscribe" to
debian-devel-REQUEST@lists.debian.org . Trouble? e-mail to Bruce@Pixar.com


Reply to: