I have two sites that are Internet connected via standard ADSL, and that want to have a simple backup server in each site that uses the other site to store an off-site redundant copy of the data.
I plan a Linux server at each site, each with a small o/s drive and two pairs of large SATA drives using software RAID (one pair for each site). That combination should give a good idea of the budget available for this.... Forget hot-swap SCSI drives on hardware raid cards etc!
I can easily see how I could rsync the first raid array on site A ("A:md0") across to B:md0, and B:md1 to A:md1.
Where I need advice is picking the best backup approach for the users at each site to use to populate their respective backup storage in the first place. (Most of the users will of-course be using Windows, and should be treated as untrained monkeys for the sake of this project.) Given the bandwidth limited ADSL link between the two sites (but with high off-peak limits) I need a backup method which does not cause rsync to needlessly send unchanged data.
If I used something like Cobian Backup to create incremental backups, although the disk space usage would be relatively small the periodic full backups will cause rsync to send full backups across the link even though 90% of the content of those backups will already have been synchronised previously. I've seen solutions that use hardlinks between identical files to provide rolling backups on Linux, but I need to support Windows.
Any suggestions?
On Wed, 2008-01-30 at 10:34 +0000, Mark Rogers wrote:
I have two sites that are Internet connected via standard ADSL, and that want to have a simple backup server in each site that uses the other site to store an off-site redundant copy of the data.
I plan a Linux server at each site, each with a small o/s drive and two pairs of large SATA drives using software RAID (one pair for each site). That combination should give a good idea of the budget available for this.... Forget hot-swap SCSI drives on hardware raid cards etc!
To be honest for smaller implementations of Raid, software raid now beats hardware raid on manageability. Also unless you spend a small fortune on the controller, software raid will in many cases have a performance edge.
Where I need advice is picking the best backup approach for the users at each site to use to populate their respective backup storage in the first place. (Most of the users will of-course be using Windows, and should be treated as untrained monkeys for the sake of this project.) Given the bandwidth limited ADSL link between the two sites (but with high off-peak limits) I need a backup method which does not cause rsync to needlessly send unchanged data.
You could look at Unison rather than rsync. However rsync is actually pretty light in terms of sending unchanged data. By default if the timestamps and file size match at both ends it doesn't even bother processing checksums. So not much more than a file list is transferred for the initial scan.
If I used something like Cobian Backup to create incremental backups, although the disk space usage would be relatively small the periodic full backups will cause rsync to send full backups across the link even though 90% of the content of those backups will already have been synchronised previously.
That will only be the case if you are planing on having multiple backup versions. Rather than a single state at each end. If you want mutiple snapshots then maybe you should look towards something like dirvish
rsync is pretty good at only transferring what it needs to. The only time a whole file will generally be moved is if it is new or if the file is compressed (in which case rsync needs to sync the whole file from the first change onwards) Unison can be configured to be a bit cleverer in terms of filename and path changes however.
I've seen solutions that use hardlinks between identical files to provide rolling backups on Linux, but I need to support Windows.
We use BackupPC to good effect to backup 20 or so smaller companies and home workers over ADSL. It does as you say with hardlinks but is overkill if there is only going to be one client at each end. There is a 2GB maximum file size limitation if the client end is windows however. (if you are looking at anything involving rsync/cygwin on Windows this appears to be the case)
You could conceivably configure a copy of backuppc at each end to locally backup the clients to it's own pool. Then configure backuppc at each end to backup the backuppc pool at the other site. Only byte level changes would be transferred after the initial sync. You'd be probably best advised to perform the initial synchronisation via a hard drive in the post and this in itself can be a little bit fiddly to get into the backuppc storage pool in the correct format. The trick is to mount the external drive on a local machine and temporarily modify the backup client settings in backup pc so for the initial sync it is looking at a local machine rather than a remote one..but you have to get paths etc the same so the directory trees look identical to backuppc between the remote server and your hard drive initial copy.
For the local machines you can also configure backuppc to use something other than rsync (like SMB mounts) from which it does the rsync processing locally and therefore avoids the 2GB file limitation. However I have never tried this.
Wayne Stallwood wrote:
To be honest for smaller implementations of Raid, software raid now beats hardware raid on manageability. Also unless you spend a small fortune on the controller, software raid will in many cases have a performance edge.
I also prefer software RAID because it's much easier to take a disk out of a RAID configuration in one machine, put it into another machine with completely different hardware, and add it to a new array or otherwise manipulate the data.
You could look at Unison rather than rsync.
Didn't even think about unison, which is off since we use it daily for website maintenance tasks.
That will only be the case if you are planing on having multiple backup versions. Rather than a single state at each end. If you want mutiple snapshots then maybe you should look towards something like dirvish
I think keeping multiple versions would be a good idea, not least because it provides an "undo" feature to the end user that acts as an incentive to buy into the backup process. People tend to rebel against anything which makes their life slightly harder unless they can see an imminent payback (well my untrained monkeys do anyway :-)
rsync is pretty good at only transferring what it needs to. The only time a whole file will generally be moved is if it is new or if the file is compressed (in which case rsync needs to sync the whole file from the first change onwards)
That was my point about using a "conventional" backup tool like Cobian that creates complete and incremental backup sets. If for sake of argument I take a full backup every Sunday then incrementals every day, then the Sunday backups will be completely "different" as far as rsync is concerned, even though their contents will be substantially the same as the previous backup. The problem isn't rsync, its the backup method being used to put the files onto the server in the first place.
I've looked at Dirvish before (but forgotten about it, so thanks for the reminder); I'm not sure how it interacts with Windows clients though. I guess I'd let each client backup to a Samba share (just files copies, no compressed backup sets) then use Dirvish to keep the rolling backups there. I only need to rsync the latest backup to the remote server.
We use BackupPC to good effect to backup 20 or so smaller companies and home workers over ADSL. It does as you say with hardlinks but is overkill if there is only going to be one client at each end. There is a 2GB maximum file size limitation if the client end is windows however. (if you are looking at anything involving rsync/cygwin on Windows this appears to be the case)
BackupPC looks like a good option, thanks.
If anyone has a >2GB file to back up I don't really want it on my server anyway; rsyncing that to the remote site will be a pain :-)
You could conceivably configure a copy of backuppc at each end to locally backup the clients to it's own pool. Then configure backuppc at each end to backup the backuppc pool at the other site. Only byte level changes would be transferred after the initial sync. You'd be probably best advised to perform the initial synchronisation via a hard drive in the post and this in itself can be a little bit fiddly to get into the backuppc storage pool in the correct format.
I would imagine that after getting backuppc to work locally, rsync would still be the simplest way to keep the two sites in sync? I'd be interested to know if there's a reason you discounted that.
I had planned on getting each site up and running for a week or so without remote backups, getting people to backup important data. I'd then go to each site (one of them is my office, which helps), and rsync from one set of disks to the other in the same machine, then take those disks to the other site and swap them. Agreed that the initial transfer online would be a pain.
I'll set up backuppc and play.
On Wed, 2008-01-30 at 15:32 +0000, Mark Rogers wrote:
I would imagine that after getting backuppc to work locally, rsync would still be the simplest way to keep the two sites in sync? I'd be interested to know if there's a reason you discounted that.
None specifically, just thought that having the status of the server2server backups in the same monitoring screens as the rest of the backup may be an advantage..but other than that there is no reason why you couldn't do it with rsync.
Wayne Stallwood wrote:
None specifically, just thought that having the status of the server2server backups in the same monitoring screens as the rest of the backup may be an advantage..
That's a good point, and probably worth taking into account. Thanks.