Run your backup script under timeoout(1)?
S
On Wed, 10 Mar 2021 at 09:10, Chris Green cl@isbd.net wrote:
On Tue, Mar 09, 2021 at 10:27:43PM +0000, steve-ALUG@hst.me.uk wrote:
On 09/03/2021 22:08, Chris Green wrote:
On Tue, Mar 09, 2021 at 09:39:20PM +0000, steve-ALUG@hst.me.uk wrote:
- I suggest investigating systemd jobs because it's the modern way to do
it, and I think you have much more control over when things run. I think you can control logs and job restarting. However it's a bit of a learning curve. Look in /etc/systemd/system/timers.target.wants for examples of jobs that are run on a schedule. You can add a line OnCalendar=daily to run once a day, or OnCalendar=*-*-* 01:15:00 to schedule a start time.
It is run by systemd. The problem is that the job shows no sign of failing, it just never completes. I don't think systemd will know any better than anything/anyone else that it has failed.
I *think* that systemd can monitor jobs and restart them if hung, but I could be wildly wrong about that.
Yes, but how will it know that it has failed? The only indication of success is the process ending so one would have to tell systemd how long to wait before deciding it has failed. Normally systemd checks if something is working by checking that the process is still there, it will still be there, just not doing anything.
- Tweak your script. Create a manual .PID file. Either just use touch to
create MyBackupHasStarted.pid (or any other name), or somehow work out the actual pid of the script (e.g. something crude like ps -Af | grep "sh.sh"
MyBackupHasStarted.pid) I forget where pid files are supposed to go. Upon
sucessfull completion of the script, delete the .pid file, i.e. do it in the last line of your script. Then create a 2nd script which checks for the presence of the pid file at a time by when it should have finished. If it exists, send an error - perhaps generate an email using one of these methods https://tecadmin.net/ways-to-send-email-from-linux-command-line/
Yes, this should do it, but how do you start a second script at a controlled time after the first one?
Don't run through Cron.daily, but instead run through crontab (or systemd with set to start at a particular time)
use crontab -e to set up a job as the current user or sudo crontab -e to set up a job as root.
There's hundreds of crontab how-tos out there. This one seems OK https://opensource.com/article/17/11/how-use-cron-linux
How does that help? Remember this is a backup that *might* take quite a long time (minutes through to a significant chunk of an hour). It also has the *major* disadvantage of not working like anacron which will run the job daily even on a system which is only turned on for a short time each day, necessary for the laptop backup at least.
- find something that you can do that produces an error if the usb disk has
become non-responsive. There must be something! Even if it means writing a python script. Do this first and if it errors, trap the error and respond to it.
As far as I can tell nothing produces an error, all that ever happens is a permanent hang. It's messy, you'd have to fire off a sub-process and have a timeout that checks if the sub-process terminates. It could be done but it is rather messy. The other issue is that one doesn't want to do it 'all the time' as it will keep the USB drive permanently running.
This is why I suggested python. I suspect that if you tried to open a file somewhere on the USB drive, it would timeout and then you could exit with an error message. I'm sure there must be some simple linux command you could use that would error rather than just hang.
You wouldn't do it all the time. You'd do it at the begining of the script. e.g. in pseudocode
(look at this to see if you can wake the USB drive https://askubuntu.com/questions/1060748/what-command-will-wake-up-a-sleeping...)
Write a PID File Tool to try to wake USB drive Tool to see if the USB drive is awake If LastJob did not error Rsync backup endif Remove PID File
The trouble is that "LastJob" will never return, as I said if you try and access the hung USB drive that's it, you can't get control back. For example if you ssh to the Pi NAS and do 'ls /bak' (/bak is the root of the USB drive) then you can't get back to the command prompt, CTRL/C does nothing. So one would have to run two processes, one to access the drive (and do something when it comes back) and a second process to wait on the first process and, after a longish delay to give up and report an error.
-- Chris Green
main@lists.alug.org.uk http://www.alug.org.uk/ https://lists.alug.org.uk/mailman/listinfo/main Unsubscribe? See message headers or the web site above!