rsnapshot is Throwing an Intermittent Error.
On most days or nights, this runs perfectly and then there is the following:
From: root@wb5agz (Cron Daemon)
Subject: Cron <root@wb5agz> /usr/local/etc/daily_backup
/bin/rm: cannot remove '/var/cache/rsnapshot/halfday.1/wb5agz/home/usr/lib/grub
/i386-pc': Transport endpoint is not connected
/bin/rm: cannot remove '/var/cache/rsnapshot/halfday.1/wb5agz/home/usr/lib/libb
ind9.so.80.0.7': Transport endpoint is not connected
This can go on for hundreds of lines and always references a
different set of directories and files.
There can also be weeks of error-free backups in which
everything just works.
I think I am creating this mess by the way I do the backups since
it smells like some sort of race condition. The daily backup
script follows:
#!/bin/sh
#Do the halfday backup first.
#The halfday mounts and unmounts the backup media.
/usr/bin/rsnapshot halfday
#Mount backup media first since this next is just a rotation.
mount /rsnapshot1 >/dev/null 2>&1 ||exit 1
mount /rsnapshot2 >/dev/null 2>&1 ||exit 1
mhddfs /rsnapshot1,/rsnapshot2 /var/cache/rsnapshot -o mlimit=100M >/dev/null 2>&1
/usr/bin/rsnapshot daily
umount /var/cache/rsnapshot /rsnapshot2 /rsnapshot1
exit 0
As far as I know, both mount and umount block until they reach
some sort of resolution, be it success or failure. I have even
seen umount hang for a perceptible amount of time when one has
changed a large number of blocks and sync hasn't had time to
catch up so I am curious as to what may be happening.
I was able to simply re-run the very same command later
for the daily backup and it worked without so much as a peep out
of anything. It just ran.
The two backup drives in question passed fsck -f -y without
a single issue. In most cases of one of these big error spews,
the files are in places that aren't changing on a regular basis
so it's not as if I caught a log file just as it was backing up.
I don't ever remember seeing log files in the spew.
I always leave the backup drives unmounted unless I need
something off of them or the cron job is running since it would
seem to make sense to not mount them continuously. A couple of
weeks ago, we got 4 electrical power blinks in one day so you
don't want your backup media mounted any more than necessary.
When I do pull something off of the backups, it is good
and not corrupted so far so I am really curious as to what is
happening.
Martin WB5AGZ
Reply to: