Re: [ALUG] How to check if a cron.daily job has run?

12 Mar 2021

      On 10/03/2021 09:09, Chris Green wrote:
...
On Tue, Mar 09, 2021 at 10:27:43PM +0000, steve-ALUG@hst.me.uk wrote:
...
On 09/03/2021 22:08, Chris Green wrote:
...
On Tue, Mar 09, 2021 at 09:39:20PM +0000, steve-ALUG@hst.me.uk wrote:
...

I suggest investigating systemd jobs because it's the modern way to do

it, and I think you have much more control over when things run. I think you
can control logs and job restarting.  However it's a bit of a learning
curve.  Look in /etc/systemd/system/timers.target.wants for examples of jobs
that are run on a schedule.  You can add a line
OnCalendar=daily
to run once a day, or
OnCalendar=*-*-* 01:15:00
to schedule a start time.
It is run by systemd.  The problem is that the job shows no sign of
failing, it just never completes.  I don't think systemd will know any
better than anything/anyone else that it has failed.
I *think* that systemd can monitor jobs and restart them if hung, but I
could be wildly wrong about that.
Yes, but how will it know that it has failed? The only indication of
success is the process ending so one would have to tell systemd how
long to wait before deciding it has failed.  Normally systemd checks
if something is working by checking that the process is still there,
it will still be there, just not doing anything.
How will you know it's failed?  Because the PID file will be there a 
while after it should have finished.  That's what your other script checks.
...
...
...
...

Tweak your script.  Create a manual .PID file.  Either just use touch to

create MyBackupHasStarted.pid (or any other name), or somehow work out the
actual pid of the script  (e.g. something crude like ps -Af | grep "sh.sh"
...
MyBackupHasStarted.pid)  I forget where pid files are supposed to go.  Upon
sucessfull completion of the script, delete the .pid file, i.e. do it in the
last line of your script.  Then create a 2nd script which checks for the
presence of the pid file at a time by when it should have finished.  If it
exists, send an error - perhaps generate an email using one of these methods
https://tecadmin.net/ways-to-send-email-from-linux-command-line/
Yes, this should do it, but how do you start a second script at a
controlled time after the first one?
Start the backup at a scheduled time e.g. Midnight.  Start the checking 
job at a different scheduled time, e.g. 1am.
...
...
Don't run through Cron.daily, but instead run through crontab (or systemd
with set to start at a particular time)
use
crontab -e
to set up a job as the current user
or
sudo crontab -e
to set up a job as root.
There's hundreds of crontab how-tos out there.  This one seems OK
https://opensource.com/article/17/11/how-use-cron-linux
How does that help?  Remember this is a backup that *might* take quite
a long time (minutes through to a significant chunk of an hour).  It
also has the *major* disadvantage of not working like anacron which
will run the job daily even on a system which is only turned on for a
short time each day, necessary for the laptop backup at least.
How does it help?
a) Running the jobs at a predictable time allows you to run the backup 
job and the checking job a certain distance apart, allowing you to check 
if the backup job has not finished, and thus, probably failed.
b) cron jobs email you if they error.
However, there are alternatives.  Run it in Systemd configured as a 
startup job, and also at a scheduled time.  Or if you're welded to using 
anacron then have a master script call a subsidiary backup script, and 
the checking script a while later
e.g.
runbackup.sh &
sleep 1h
runDeadBackupCheck &
The jobs will run in the background though and you'll loose any direct 
error logging, unless you roll your own.
You could control things with a higher higher level language, e.g. python.
You could use "at" instead of "sleep"
Are you saying that anacron runs your backup on boot, and yet you're 
having problems with the USB disk going to sleep, even thought the 
laptop has only been on for a short time each day?
If that's the case then there's something very weird with your machine.  
I would suggest checking the bios (new or old style), and also using 
some tools (e.g. ultimate boot cd) to check if there are any weird 
paramaters set on your USB disk.
Google for how to keep a drive alive - e.g. 
https://unix.stackexchange.com/questions/5211/prevent-a-usb-external-hard-dr...
...
...
...
...

find something that you can do that produces an error if the usb disk has

become non-responsive.  There must be something!  Even if it means writing a
python script.  Do this first and if it errors, trap the error and respond
to it.
As far as I can tell nothing produces an error, all that ever happens
is a permanent hang.  It's messy, you'd have to fire off a sub-process
and have a timeout that checks if the sub-process terminates.  It
could be done but it is rather messy.  The other issue is that one
doesn't want to do it 'all the time' as it will keep the USB drive
permanently running.
This is why I suggested python.  I suspect that if you tried to open a file
somewhere on the USB drive, it would timeout and then you could exit with an
error message.  I'm sure there must be some simple linux command you could
use that would error rather than just hang.
You wouldn't do it all the time.  You'd do it at the begining of the
script.  e.g. in pseudocode
(look at this to see if you can wake the USB drive https://askubuntu.com/questions/1060748/what-command-will-wake-up-a-sleeping...)
Write a PID File
Tool to try to wake USB drive
Tool to see if the USB drive is awake
If LastJob did not error
     Rsync backup
endif
Remove PID File
The trouble is that "LastJob" will never return, as I said if you try
and access the hung USB drive that's it, you can't get control back.
For example if you ssh to the Pi NAS and do 'ls /bak'  (/bak is the
root of the USB drive) then you can't get back to the command prompt,
CTRL/C does nothing.
This is why I said
"This is why I suggested python.  I suspect that if you tried to open a file
somewhere on the USB drive, it would timeout and then you could exit with an
error message.  I'm sure there must be some simple linux command you could
use that would error rather than just hang."
I also said
"4) find something that you can do that produces an error if the usb 
disk has
become non-responsive.  There must be something!  Even if it means writing a
python script.  Do this first and if it errors, trap the error and respond
to it."
...
So one would have to run two processes, one to
access the drive (and do something when it comes back) and a second
process to wait on the first process and, after a longish delay to
give up and report an error.
That is one way.
Steve

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

1997

Re: [ALUG] How to check if a cron.daily job has run?