Re: [ALUG] How to check if a cron.daily job has run?

9 Mar 2021

      On Tue, Mar 09, 2021 at 09:39:20PM +0000, steve-ALUG@hst.me.uk wrote:
...
On 09/03/2021 19:32, Chris Green wrote:
...
I run a daily backup on four systems.  It's a bash script called
'backup' (original eh?) which is in /etc/cron.daily and thus gets run
by Anacron in the early morning.
I need to check that the backup has run somehow as, very occasionally,
the backup system's USB disk 'falls asleep' and as a result any
attempt to access it just hangs.  Thus the rsync backup job doesn't
fail (and produce an error message) it just never returns.
So how can I check and get an error message if this happens?  The next
day is soon enough, it's not an urgent problem, I just need to know if
my daily backups are working.  However this needs to be 'no message
means it has worked', message only if it has not worked.
Suggestions:
1) Don't run it through /etc/cron.daily.  I don't think you have much
control over log files/error files/output from jobs run in cron.daily.  Is
this a job you've manually put in there, or is it setup via crontab?  If
it's the former, I'd suggest the latter.
It's manually put there by me, nothing to do with crontab (which
doesn't put things in cron.daily).
...
2) I suggest investigating systemd jobs because it's the modern way to do
it, and I think you have much more control over when things run. I think you
can control logs and job restarting.  However it's a bit of a learning
curve.  Look in /etc/systemd/system/timers.target.wants for examples of jobs
that are run on a schedule.  You can add a line
OnCalendar=daily
to run once a day, or
OnCalendar=*-*-* 01:15:00
to schedule a start time.
It is run by systemd.  The problem is that the job shows no sign of
failing, it just never completes.  I don't think systemd will know any
better than anything/anyone else that it has failed.
...
3) Tweak your script.  Create a manual .PID file.  Either just use touch to
create MyBackupHasStarted.pid (or any other name), or somehow work out the
actual pid of the script  (e.g. something crude like ps -Af | grep "sh.sh"
...
MyBackupHasStarted.pid)  I forget where pid files are supposed to go.  Upon
sucessfull completion of the script, delete the .pid file, i.e. do it in the
last line of your script.  Then create a 2nd script which checks for the
presence of the pid file at a time by when it should have finished.  If it
exists, send an error - perhaps generate an email using one of these methods
https://tecadmin.net/ways-to-send-email-from-linux-command-line/
Yes, this should do it, but how do you start a second script at a
controlled time after the first one?
...
4) find something that you can do that produces an error if the usb disk has
become non-responsive.  There must be something!  Even if it means writing a
python script.  Do this first and if it errors, trap the error and respond
to it.
As far as I can tell nothing produces an error, all that ever happens
is a permanent hang.  It's messy, you'd have to fire off a sub-process
and have a timeout that checks if the sub-process terminates.  It
could be done but it is rather messy.  The other issue is that one
doesn't want to do it 'all the time' as it will keep the USB drive
permanently running.

I think the best idea is something like 3).

-- 
Chris Green