On Mon, Oct 20, 2014 at 09:38:58PM +0100, Adam Bower wrote:
On Mon, Oct 20, 2014 at 04:41:14PM +0100, Chris Green wrote:
Not to mention one more fundamental problem with obnam, it doesn't offer any obvious/easy way to automate daily/monthly/yearly backups.
If only there was a way on unix systems to run a command on a regular schedule without user intervention.
Strangely enough I do know about cron.
HOWEVER a decent backup facility integrates with cron as part of its design, obnam's documentation doesn't show how one can do daily/weekly/monthly/yearly backups.
On Mon, Oct 20, 2014 at 11:44:12PM +0100, Chris Green wrote:
HOWEVER a decent backup facility integrates with cron as part of its design, obnam's documentation doesn't show how one can do daily/weekly/monthly/yearly backups.
Within a few minutes of reading the docs it was evidently clear how you would do this. The entire point is you choose your schedule, the software is being flexible. If you can't work out how to create your own schedule then that's *your* fault not the software.
Adam
On Tue, Oct 21, 2014 at 10:49:24AM +0100, Adam Bower wrote:
On Mon, Oct 20, 2014 at 11:44:12PM +0100, Chris Green wrote:
HOWEVER a decent backup facility integrates with cron as part of its design, obnam's documentation doesn't show how one can do daily/weekly/monthly/yearly backups.
Within a few minutes of reading the docs it was evidently clear how you would do this. The entire point is you choose your schedule, the software is being flexible. If you can't work out how to create your own schedule then that's *your* fault not the software.
I disagree:-
It isn't particularly obvious from the documentation how to schedule regular backups. Yes, we should know about cron but it would be nice to know how it's suggested to schedule things.
It's not at all obvious how to manage a tiered backup system with, say, 7 daily backups, 4 weekly backups, 12 monthly backups, etc. Other systems (such as rsnapshot) make this a fundamental part of the design.
IMHO it was easier for me to write a backup system (using rsync and a 70 line shell script) than to work through the documentation for obnam and create the configuration file and cron schedules for that.
That last point was my original one - the complexity of most backup systems seems to me to be such that it's often much simpler (and you get *exactly* what you want) to D-I-Y. This isn't a particular criticism of obnam, it was my starting point when I asked if the algorithm I posted some days ago looked correct.
... anyway, this is getting a bit argumentative, not really what I was intending. Don't take me too seriously. :-)
On Tue, Oct 21, 2014 at 11:14:22AM +0100, Chris Green wrote:
... anyway, this is getting a bit argumentative, not really what I was
That's because one again you're asking for ideas and then rubbishing the suggestions that people make.
intending. Don't take me too seriously. :-)
I don't think there's any risk of that.
Adam
On 21/10/14 11:14, Chris Green wrote:
[SNIP]
That last point was my original one - the complexity of most backup systems seems to me to be such that it's often much simpler (and you get *exactly* what you want) to D-I-Y. This isn't a particular criticism of obnam, it was my starting point when I asked if the algorithm I posted some days ago looked correct.
[SNIP]
I'm not sure about the current level of support or development, but for many years we've been using rdiff-backup (www.nongnu.org/rdiff-backup). The most recent release was 2009, but it is stable and mature, and is, essentially, a front-end to rsync (well, it uses the python librsync library).
Here's the overview:
---- cut here ---- What is it?
rdiff-backup backs up one directory to another, possibly over a network. The target directory ends up a copy of the source directory, but extra reverse diffs are stored in a special subdirectory of that target directory, so you can still recover files lost some time ago. The idea is to combine the best features of a mirror and an incremental backup. rdiff-backup also preserves subdirectories, hard links, dev files, permissions, uid/gid ownership, modification times, extended attributes, acls, and resource forks. Also, rdiff-backup can operate in a bandwidth efficient manner over a pipe, like rsync. Thus you can use rdiff-backup and ssh to securely back a hard drive up to a remote location, and only the differences will be transmitted. Finally, rdiff-backup is easy to use and settings have sensical defaults." ---- cut here ----
Editorial aside: "sensical"?
It doesn't encrypt the backups, but it does use password-less keys in SSH to do the copies, can resume if the backup fails, does incrementals, and provides an instant copy of the latest version of a file.
Here's an example of the incantation using SSH, where "backup::" is specifically defined in the SSH config file:
rdiff-backup --print-statistics /usr/local backup::/backups/$CUST/$PC/$HOST/local >> /var/log/backups 2>&1
No problems yet, but it seems to have been left behind since duplicity came along. Worryingly, it is masked on Gentoo (actually, it always has been now I think about it), now with the comment: "Dead upstream, has known dataloss bugs. Please use something more sane: rsnapshot, backuppc, obnam, ..."
I've never had any data loss, but it does crap out sometimes. Never had a problem repairing any damage either.
Looking at wikipedia, there seems to be a fair quantity of stuff using rsync. Something there must be useful:
http://en.wikipedia.org/wiki/Rsync
My current requirement is that I have two clients who want to provide reciprocal off-site backup facilities. That requires encryption. So, I'll probably use rdiff-backup for the local copy (for a virtually instant local restore, with increments as well), and duplicity for the encrypted remote copies.
Cheers, Laurie.
On Tue, Oct 21, 2014 at 12:30:58PM +0100, Laurie Brown wrote:
On 21/10/14 11:14, Chris Green wrote:
[SNIP]
That last point was my original one - the complexity of most backup systems seems to me to be such that it's often much simpler (and you get *exactly* what you want) to D-I-Y. This isn't a particular criticism of obnam, it was my starting point when I asked if the algorithm I posted some days ago looked correct.
[SNIP]
I'm not sure about the current level of support or development, but for many years we've been using rdiff-backup (www.nongnu.org/rdiff-backup). The most recent release was 2009, but it is stable and mature, and is, essentially, a front-end to rsync (well, it uses the python librsync library).
It was on my original shortlist a while back but I chose rsnapshot instead because rsnapshot does snapshots (i.e. each backup looks like a complete snapshot of everything you're backing up) whereas rdiff-backup does 'masters' plus diffs over a period.
This is a *big* distinction IMHO. With rdiff-backup you have to reconstruct any file you want to restore unless it happens to be unchanged since the last 'master' backup. Also as I understand how rdiff-backup works the diffs get more and more 'distant' as you make more and more backups.
With rsnapshot (or my more recent home-made system) hard links are used to save space where files haven't changed so every backup you make is a complete set of files, you can just copy the file back from the backup you select, no reconstruction needed.
I much prefer snapshots as they seem to me much safer and more robust.
On 21/10/14 13:38, Chris Green wrote:
[SNIP]
It was on my original shortlist a while back but I chose rsnapshot instead because rsnapshot does snapshots (i.e. each backup looks like a complete snapshot of everything you're backing up) whereas rdiff-backup does 'masters' plus diffs over a period.
This is a *big* distinction IMHO. With rdiff-backup you have to reconstruct any file you want to restore unless it happens to be unchanged since the last 'master' backup. Also as I understand how rdiff-backup works the diffs get more and more 'distant' as you make more and more backups.
With rsnapshot (or my more recent home-made system) hard links are used to save space where files haven't changed so every backup you make is a complete set of files, you can just copy the file back from the backup you select, no reconstruction needed.
I much prefer snapshots as they seem to me much safer and more robust.
So... Why ask about incrementals in the subject of the OP if you've already decided to use snapshots only?
The advantage of incrementals is the space saved. And, of course, you can always take an annual/quarterly/monthly/whatever snapshot of the incrementals, archive it to release space and then start again... In fact, that's a basic requirement for sensible file management.
Anyway, whatever.
Cheers, Laurie.
On Tue, Oct 21, 2014 at 03:34:44PM +0100, Laurie Brown wrote:
On 21/10/14 13:38, Chris Green wrote:
[SNIP]
It was on my original shortlist a while back but I chose rsnapshot instead because rsnapshot does snapshots (i.e. each backup looks like a complete snapshot of everything you're backing up) whereas rdiff-backup does 'masters' plus diffs over a period.
This is a *big* distinction IMHO. With rdiff-backup you have to reconstruct any file you want to restore unless it happens to be unchanged since the last 'master' backup. Also as I understand how rdiff-backup works the diffs get more and more 'distant' as you make more and more backups.
With rsnapshot (or my more recent home-made system) hard links are used to save space where files haven't changed so every backup you make is a complete set of files, you can just copy the file back from the backup you select, no reconstruction needed.
I much prefer snapshots as they seem to me much safer and more robust.
So... Why ask about incrementals in the subject of the OP if you've already decided to use snapshots only?
Because, to my mind and many others, incremental can mean differential or snapshot! :-)
The advantage of incrementals is the space saved. And, of course, you can always take an annual/quarterly/monthly/whatever snapshot of the incrementals, archive it to release space and then start again... In fact, that's a basic requirement for sensible file management.
I think you'll find that there's very little difference. In reality a lot more than 90% of what you save over the long term doesn't change so it's only the efficiency or otherwise of saving the changed bits that affects how your backups grow.
My snapshots save almost 200Gb of files, each snapshot after the first one occupies between 300Mb and 400Mb, so I can save a lot of snapshots without using much space. Even a hundred snapshots would take only 40Gb or so more.
Obviously if lots of files change then the snapshots start eating more space but that applies to differential backups too.
On 21/10/14 16:11, Chris Green wrote:
[SNIP]
I much prefer snapshots as they seem to me much safer and more robust.
So... Why ask about incrementals in the subject of the OP if you've already decided to use snapshots only?
Because, to my mind and many others, incremental can mean differential or snapshot! :-)
Er. No!
A snapshot is exactly that, it is a copy of a state at an exact moment in time, with no references to the past or future.
Incremental means - well, by increments. In the case of backups, it means that only those changes since the last backup are saved rather than the whole file. Each of those changes is by its nature incremental, so of course, it means rebuilding, using those increments, from the original outwards, in the correct order.
They simply are not the same. If you ask for incremental backups, that's what you'll get, not some kind of random definition of a snapshot, massaged to become a differential. Which is what exactly in that context? Seriously? Those are rhetorical questions, BTW. I realise they don't have an answer. Not sensible ones anyway.
Sigh...
On Tue, Oct 21, 2014 at 04:40:48PM +0100, Laurie Brown wrote:
On 21/10/14 16:11, Chris Green wrote:
[SNIP]
I much prefer snapshots as they seem to me much safer and more robust.
So... Why ask about incrementals in the subject of the OP if you've already decided to use snapshots only?
Because, to my mind and many others, incremental can mean differential or snapshot! :-)
Er. No!
A snapshot is exactly that, it is a copy of a state at an exact moment in time, with no references to the past or future.
Incremental means - well, by increments. In the case of backups, it means that only those changes since the last backup are saved rather than the whole file. Each of those changes is by its nature incremental, so of course, it means rebuilding, using those increments, from the original outwards, in the correct order.
They simply are not the same. If you ask for incremental backups, that's what you'll get, not some kind of random definition of a snapshot, massaged to become a differential. Which is what exactly in that context? Seriously? Those are rhetorical questions, BTW. I realise they don't have an answer. Not sensible ones anyway.
Sigh...
Sigher! :-)
You do realise that snapshots use hard links to save space do you? Thus they do effectively have references to the past and future even though each is, as you say, a snapshot at the given time. A file that doesn't change has hard links from multiple snapshots to the same data on the disk so there's only actually one copy of it.
An incremental backup doesn't have to have increments of *files*, though it may do. The increments can be simply the increments of *file system* that are needed to build up the complete backup.
That's the problem in a way, the words don't really define things exactly enough.
Try doing a Google search for inremental or snapshot or differential backup and you will see they all tend to mean all things to all men.
On 21 October 2014 17:33, Chris Green cl@isbd.net wrote:
A file that doesn't change has hard links from multiple snapshots to the same data on the disk so there's only actually one copy of it.
Actually I think it is important to think of rsnapshot as a bit of a hybrid, which takes incremental backups but presents them as multiple snapshots.
One of the limitations of taking "snapshots" but using hard links is that you only have one copy of each unique file version (that is the reason to do it, of-course). That means that if you lose a few disk sectors on the backup drive, any file that gets borked is borked in all "copies" of the backup where that file was unchanged. In a "real" backup with multiple snapshots you could take the file from a different snapshot.
This is a "feature" that it shares with traditional incremental backups, where failure of a small part of the backup media can affect multiple backups. This can be mitigated against with RAID etc.
That said, as far as the topic of this thread is concerned: Backups *are* pretty simple, by their nature, but any solution will evolve to take into account the fact that real life is not. Ideally you'd just take a complete duplicate of everything onto 100% reliable media every day and never worry about it again, but disk capacities, transfer speeds, media reliability, etc all result in trade offs. I'm pretty sure that rsnapshot will have started out as a simple script (along the lines of what Chris has now) and evolved to deal with the real world complications. Some won't effect Chris, some might in the future even if they haven't yet. Trusting a backup means trusting the backup mechanism, and if Chris has more trust in hist own solution than one that's been tweaked and tested by a larger group of people over some considerable time then that's fine, it's only his data that's at risk. Indeed, having a simpler system that he fully understands may well increase the trustworthiness. For my part I would prefer an established solution over a roll-my-own. But the bottom line is that far too many people simply don't have backups at all, and any of the options discussed here are likely better than that.
Mark
On Wed, Oct 22, 2014 at 11:00:09AM +0100, Mark Rogers wrote:
On 21 October 2014 17:33, Chris Green cl@isbd.net wrote:
A file that doesn't change has hard links from multiple snapshots to the same data on the disk so there's only actually one copy of it.
Actually I think it is important to think of rsnapshot as a bit of a hybrid, which takes incremental backups but presents them as multiple snapshots.
One of the limitations of taking "snapshots" but using hard links is that you only have one copy of each unique file version (that is the reason to do it, of-course). That means that if you lose a few disk sectors on the backup drive, any file that gets borked is borked in all "copies" of the backup where that file was unchanged. In a "real" backup with multiple snapshots you could take the file from a different snapshot.
This is a "feature" that it shares with traditional incremental backups, where failure of a small part of the backup media can affect multiple backups. This can be mitigated against with RAID etc.
Any backup (even multiple actual copies of a file) that goes to a single drive is basically vulnerable to a failure of that drive.
I do frequent (as in hourly/daily) hard-linked snapshot backups to another drive on my desktop machine and less frequent (as in daily, monthly, yearly) backups to my NAS in the garage (which is a longish way away from the house). So, hopefully, I've got most things covered. Of course there are particular cases that aren't covered but you can't do *everything*. For critical stuff (like company files, accounts, photo archive, etc.) I add other backups but they are just "as it is now" copies with no history.
That said, as far as the topic of this thread is concerned: Backups *are* pretty simple, by their nature, but any solution will evolve to take into account the fact that real life is not. Ideally you'd just take a complete duplicate of everything onto 100% reliable media every day and never worry about it again, but disk capacities, transfer speeds, media reliability, etc all result in trade offs. I'm pretty sure that rsnapshot will have started out as a simple script (along the lines of what Chris has now) and evolved to deal with the real world complications. Some won't effect Chris, some might in the future even if they haven't yet. Trusting a backup means trusting the backup mechanism, and if Chris has more trust in hist own solution than one that's been tweaked and tested by a larger group of people over some considerable time then that's fine, it's only his data that's at risk.
The advantage is that it's simple because it only does what I want it to do rather than having the complexity needed to provide flexibility for multiple needs. It also means I can change it if/when I need to.
Indeed, having a simpler system that he fully understands may well increase the trustworthiness. For my part I would prefer an established solution over a roll-my-own. But the bottom line is that far too many people simply don't have backups at all, and any of the options discussed here are likely better than that.
Agreed! The backup system needs to 'just work' too, with no effort or interaction from the user.
On 21/10, Chris Green wrote:
It isn't particularly obvious from the documentation how to schedule regular backups. Yes, we should know about cron but it would be nice to know how it's suggested to schedule things.
I hate to bang the obvious drum (wait no, I love it) but Obnam is under the GPL so I'd suggest submitting a documentation change to the project to help both you and it along. With knowledge of cron and the needs of a backup solution, you'd be well placed to write a useful guide that could be included.
Steve