On Wed, 2008-01-30 at 10:34 +0000, Mark Rogers wrote:
I have two sites that are Internet connected via standard ADSL, and that want to have a simple backup server in each site that uses the other site to store an off-site redundant copy of the data.
I plan a Linux server at each site, each with a small o/s drive and two pairs of large SATA drives using software RAID (one pair for each site). That combination should give a good idea of the budget available for this.... Forget hot-swap SCSI drives on hardware raid cards etc!
To be honest for smaller implementations of Raid, software raid now beats hardware raid on manageability. Also unless you spend a small fortune on the controller, software raid will in many cases have a performance edge.
Where I need advice is picking the best backup approach for the users at each site to use to populate their respective backup storage in the first place. (Most of the users will of-course be using Windows, and should be treated as untrained monkeys for the sake of this project.) Given the bandwidth limited ADSL link between the two sites (but with high off-peak limits) I need a backup method which does not cause rsync to needlessly send unchanged data.
You could look at Unison rather than rsync. However rsync is actually pretty light in terms of sending unchanged data. By default if the timestamps and file size match at both ends it doesn't even bother processing checksums. So not much more than a file list is transferred for the initial scan.
If I used something like Cobian Backup to create incremental backups, although the disk space usage would be relatively small the periodic full backups will cause rsync to send full backups across the link even though 90% of the content of those backups will already have been synchronised previously.
That will only be the case if you are planing on having multiple backup versions. Rather than a single state at each end. If you want mutiple snapshots then maybe you should look towards something like dirvish
rsync is pretty good at only transferring what it needs to. The only time a whole file will generally be moved is if it is new or if the file is compressed (in which case rsync needs to sync the whole file from the first change onwards) Unison can be configured to be a bit cleverer in terms of filename and path changes however.
I've seen solutions that use hardlinks between identical files to provide rolling backups on Linux, but I need to support Windows.
We use BackupPC to good effect to backup 20 or so smaller companies and home workers over ADSL. It does as you say with hardlinks but is overkill if there is only going to be one client at each end. There is a 2GB maximum file size limitation if the client end is windows however. (if you are looking at anything involving rsync/cygwin on Windows this appears to be the case)
You could conceivably configure a copy of backuppc at each end to locally backup the clients to it's own pool. Then configure backuppc at each end to backup the backuppc pool at the other site. Only byte level changes would be transferred after the initial sync. You'd be probably best advised to perform the initial synchronisation via a hard drive in the post and this in itself can be a little bit fiddly to get into the backuppc storage pool in the correct format. The trick is to mount the external drive on a local machine and temporarily modify the backup client settings in backup pc so for the initial sync it is looking at a local machine rather than a remote one..but you have to get paths etc the same so the directory trees look identical to backuppc between the remote server and your hard drive initial copy.
For the local machines you can also configure backuppc to use something other than rsync (like SMB mounts) from which it does the rsync processing locally and therefore avoids the 2GB file limitation. However I have never tried this.