On 09/03/2021 19:32, Chris Green wrote:
I run a daily backup on four systems. It's a bash script called 'backup' (original eh?) which is in /etc/cron.daily and thus gets run by Anacron in the early morning.
I need to check that the backup has run somehow as, very occasionally, the backup system's USB disk 'falls asleep' and as a result any attempt to access it just hangs. Thus the rsync backup job doesn't fail (and produce an error message) it just never returns.
So how can I check and get an error message if this happens? The next day is soon enough, it's not an urgent problem, I just need to know if my daily backups are working. However this needs to be 'no message means it has worked', message only if it has not worked.
Suggestions: 1) Don't run it through /etc/cron.daily. I don't think you have much control over log files/error files/output from jobs run in cron.daily. Is this a job you've manually put in there, or is it setup via crontab? If it's the former, I'd suggest the latter.
2) I suggest investigating systemd jobs because it's the modern way to do it, and I think you have much more control over when things run. I think you can control logs and job restarting. However it's a bit of a learning curve. Look in /etc/systemd/system/timers.target.wants for examples of jobs that are run on a schedule. You can add a line OnCalendar=daily to run once a day, or OnCalendar=*-*-* 01:15:00 to schedule a start time.
3) Tweak your script. Create a manual .PID file. Either just use touch to create MyBackupHasStarted.pid (or any other name), or somehow work out the actual pid of the script (e.g. something crude like ps -Af | grep "sh.sh" >MyBackupHasStarted.pid) I forget where pid files are supposed to go. Upon sucessfull completion of the script, delete the .pid file, i.e. do it in the last line of your script. Then create a 2nd script which checks for the presence of the pid file at a time by when it should have finished. If it exists, send an error - perhaps generate an email using one of these methods https://tecadmin.net/ways-to-send-email-from-linux-command-line/
4) find something that you can do that produces an error if the usb disk has become non-responsive. There must be something! Even if it means writing a python script. Do this first and if it errors, trap the error and respond to it.
Good luck.
Steve