I have a number of large[ish] directory trees that I would like to keep synchronised across various systems. There are basically three systems involved: desktop, laptop and a remote server. They all run Linux (xubuntu and ubuntu).
I don't need real-time synchronisation like that provided by btsync, syncthing or pulse. The directory trees are really too big for this sort of thing anyway.
On the other hand I don't want a manual system like Unison which anyway seems beset by problems with different versions at each end.
All I want (all?) is a way to keep the directories in sync when changes occur at either end. Since the changes will be mostly generated by me it will be very rare for changes to occur in two places at the same time. If they do then I'm quite happy for the system to require manual intervention. I just want the general case to be automatic. Updating once an hour or so would be quite satisfactory.
I can do much of it using rsync with the --update option run first one way and then the other, this does nearly everything I want but there is one gotcha - files deleted will re-appear. File deletions will not be frequent but they will occur occasionally.
Adding a --delete option to rsync doesn't necessarily help, it could also mean that new files added get deleted. It all depends on the relative timing of additions and deletions and the copying each way.
Nearly all the 'directory synchronisation' tools I can find are one-way and do very little more than one can do easily oneself using rsync.
Has anyone come across anything that might do something like I want?
On 5 April 2016 at 14:26, Chris Green cl@isbd.net wrote:
I don't need real-time synchronisation like that provided by btsync, syncthing or pulse. The directory trees are really too big for this sort of thing anyway.
How big are they?
I have some very large trees synced using btsync.
I would prefer to use syncthing but it had a number of limitations last I checked and btsync worked far better for me. That said, if you were to use syncthing it (by default, last I checked) doesn't use inotify to detect file changes but scans the directory tree periodically, and that period can be changed, so i think it could so the once-per-hour that you've suggested.
It's biggest down side is that from memory it doesn't do an rsync-style transfer of differences but transfers the whole file - although I might be wrong or out of date on that. I recall this being one of the reasons I ended up with btsync.
On the other hand I don't want a manual system like Unison which anyway seems beset by problems with different versions at each end.
The version issue is a nuisance but can be managed. It's been a while since I used it in anger but i routinely copied binaries between systems and rarely had an issue, although a better solution would be to compile from source on each machine and keep them in sync that way.
I can do much of it using rsync with the --update option run first one way and then the other, this does nearly everything I want but there is one gotcha - files deleted will re-appear. File deletions will not be frequent but they will occur occasionally.
This description is pretty much why unison was created. In other words your use case is pretty much perfect for unison.
Has anyone come across anything that might do something like I want?
In the absence of alternatives others might suggest I'd go with btsync or unison. The limitations of either are probably less difficult to work around than struggling along with rsync or scripting something of your own.
On 6 April 2016 at 09:12, Mark Rogers mark@more-solutions.co.uk wrote:
I have some very large trees synced using btsync.
To add to that, since one man's large is another man's tiny: one of the trees I sync with btsync has 1.1TB in 365432 files (across 22027 directories) as of this morning.
Mark
On Wed, Apr 06, 2016 at 09:15:26AM +0100, Mark Rogers wrote:
On 6 April 2016 at 09:12, Mark Rogers mark@more-solutions.co.uk wrote:
I have some very large trees synced using btsync.
To add to that, since one man's large is another man's tiny: one of the trees I sync with btsync has 1.1TB in 365432 files (across 22027 directories) as of this morning.
Well (as you can see from my reply to your first response) that's somewhat bigger than I'm talking about. However mine will be across a slow, internet link. One direction will only be 0.5Mb/s or so.
On 6 April 2016 at 09:47, Chris Green cl@isbd.net wrote:
Well (as you can see from my reply to your first response) that's somewhat bigger than I'm talking about. However mine will be across a slow, internet link. One direction will only be 0.5Mb/s or so.
Speed shouldn't really be an issue aside from the transfer itself, so it's a question of how to reduce that. My instinct would be that btsync will be the most efficient - it comes from bittorrent which was always designed for transferring files over a wide variety of connections including those with slow upstream. Most of the "work" is done at each end in terms of hashing all the files, and content changes are transferred incrementally. So if you start with pretty much synced folders (ie initial copy via a different means) then the volume of actual data transfer will be pretty minimal over the link.
From my understanding of rsync, because it doesn't store its state
between uses it'll have to send the meta data of every single file across the link on each invocation in order to detect changes. Unison does store its state so should be better although I'd still bet on btsync to be better. Syncthing (and it's fork, pulse) suffer from the lack of incremental change handling (assuming I have that right) but as you say that's probably not an issue for you with small files.
(My use case is also over the Internet, but the link is a good deal faster than yours. However it wasn't always thus, and I used it for smaller folders quite happily back then too.)
For small text files I'd also consider git/svn/etc
On Wed, Apr 06, 2016 at 10:07:06AM +0100, Mark Rogers wrote:
On 6 April 2016 at 09:47, Chris Green cl@isbd.net wrote:
Well (as you can see from my reply to your first response) that's somewhat bigger than I'm talking about. However mine will be across a slow, internet link. One direction will only be 0.5Mb/s or so.
Speed shouldn't really be an issue aside from the transfer itself, so it's a question of how to reduce that. My instinct would be that btsync will be the most efficient - it comes from bittorrent which was always designed for transferring files over a wide variety of connections including those with slow upstream. Most of the "work" is done at each end in terms of hashing all the files, and content changes are transferred incrementally. So if you start with pretty much synced folders (ie initial copy via a different means) then the volume of actual data transfer will be pretty minimal over the link.
Yes, the 'ends' are already approximately in sync. I can run rsync manually to get them exactly in sync anyway because I know which 'end' is the right/up-to-date version at any particular time.
From my understanding of rsync, because it doesn't store its state between uses it'll have to send the meta data of every single file across the link on each invocation in order to detect changes. Unison does store its state so should be better although I'd still bet on btsync to be better. Syncthing (and it's fork, pulse) suffer from the lack of incremental change handling (assuming I have that right) but as you say that's probably not an issue for you with small files.
On my data, if nothing much has changed, rsync takes between 10 and 20 seconds.
(My use case is also over the Internet, but the link is a good deal faster than yours. However it wasn't always thus, and I used it for smaller folders quite happily back then too.)
For small text files I'd also consider git/svn/etc
But you can't automate it can you? The files that get changed are Wiki files from Dokuwiki which keeps all its pages as text files. The changes to synchronise can thus happen when I (or someone else, rarely) changes something in the wiki using a browser. You can't hook that into a source code control system can you?
On 6 April 2016 at 10:32, Chris Green cl@isbd.net wrote:
On my data, if nothing much has changed, rsync takes between 10 and 20 seconds.
I'd say that's quite a long time for "do nothing", and whilst still not excessive I'd be surprised if any of the alternative solutions were worse as that should really be a worst case.
For small text files I'd also consider git/svn/etc
You can't hook that into a source code control system can you?
I'm sure it can be done, even if only via an hourly automated update/commit.
The disadvantage is that as a version control system you'll be keeping a copy of all previous versions at each point as well. The advantage is exactly the same: you'll have a version history if you want to roll back a change at some point.
For your use case as described I'd still be edging towards bysync though, or a recommendation to use syncthing if you can get it to work properly as it's open source.
On Wed, Apr 06, 2016 at 11:29:19AM +0100, Mark Rogers wrote:
On 6 April 2016 at 10:32, Chris Green cl@isbd.net wrote:
On my data, if nothing much has changed, rsync takes between 10 and 20 seconds.
I'd say that's quite a long time for "do nothing", and whilst still not excessive I'd be surprised if any of the alternative solutions were worse as that should really be a worst case.
For small text files I'd also consider git/svn/etc
You can't hook that into a source code control system can you?
I'm sure it can be done, even if only via an hourly automated update/commit.
It would be difficult though, it doesn't just have to cope with file updates (that's simple with rsync anyway), it has to manage file additions and deletions (the bit that's difficult with rsync).
The disadvantage is that as a version control system you'll be keeping a copy of all previous versions at each point as well. The advantage is exactly the same: you'll have a version history if you want to roll back a change at some point.
Dokuwiki keeps a history anyway so more would be overkill though it wouldn't actually matter, space isn't an issue.
For your use case as described I'd still be edging towards bysync though, or a recommendation to use syncthing if you can get it to work properly as it's open source.
Yes, I think you're right. I'll take a look at syncthing because not only is it open source but it has a proper Ubuntu/Debian PPA that I can hook into so I don't have to think (too much) about keeping things up to date. If at all possible I use software with a PPA so that all I have to do is run 'apt-get update;apt-get upgrade" every so often. (Or on my desktop and laptop it asks me politely of course, even better!)
On Wed, Apr 06, 2016 at 09:12:21AM +0100, Mark Rogers wrote:
On 5 April 2016 at 14:26, Chris Green cl@isbd.net wrote:
I don't need real-time synchronisation like that provided by btsync, syncthing or pulse. The directory trees are really too big for this sort of thing anyway.
How big are they?
Well the 'obvious' tree to do it to is about 300Mb and 4000 files, however I could actually cut it down to 36Mb and 1400 files if really necessary.
I have some very large trees synced using btsync.
I would prefer to use syncthing but it had a number of limitations last I checked and btsync worked far better for me. That said, if you were to use syncthing it (by default, last I checked) doesn't use inotify to detect file changes but scans the directory tree periodically, and that period can be changed, so i think it could so the once-per-hour that you've suggested.
That sounds a reasonable approach as the scenario in my case is that I'd be editing files either at one end or the other end, very unlikely to switch from one to the other in quite a long period.
It's biggest down side is that from memory it doesn't do an rsync-style transfer of differences but transfers the whole file - although I might be wrong or out of date on that. I recall this being one of the reasons I ended up with btsync.
That probably isn't a big issue for me, the files that will change are mostly small text files (a few kb).
I can do much of it using rsync with the --update option run first one way and then the other, this does nearly everything I want but there is one gotcha - files deleted will re-appear. File deletions will not be frequent but they will occur occasionally.
This description is pretty much why unison was created. In other words your use case is pretty much perfect for unison.
Hmm, it doesn't seem so well suited to automating (scripts run from cron) as rsync does though. I will take a longer look though.
Has anyone come across anything that might do something like I want?
In the absence of alternatives others might suggest I'd go with btsync or unison. The limitations of either are probably less difficult to work around than struggling along with rsync or scripting something of your own.
Yes, that's why I've been looking. Thanks for the helpful feedback.
Just a report on this (file synchronization).
I've installed syncthing on the three systems where I want to synchronize files and, so far, it seems to be doing what I need.
It has a nice simple easy to understand web GUI.
It doesn't "automate" very well, it's designed more for a single user sort of scenario. There's no ready-to-go set-up for running it as a server and the documentation says add it to the GUI "run at startup" settings but this of course means it will only run when you actually log in to your working desktop. If the system gets rebooted remotely for any reason you won't have syncthing running.
I have thus added it to /etc/rc.local, however even this isn't particularly tidy as it needs to run as the user (not root) so you have to do:-
su -c '/usr/bin/syncthing -no-browser -home="/home/chris/.config/syncthing" -logfile="/home/chris/tmp/syncthing.log"' chris &
It's also almost impossible to run without the GUI, so to manage the remote syncthing on my web server I *have* to allow external access to it.
However, as I said, it does seem to do a good job of synchronizing files across my three systems.
However, as I said, it does seem to do a good job of synchronizing files across my three systems.
... and continues to do so, so far I can't fault it. It has synchronised the directories I've asked it too between laptop and desktop with the laptop going around all sorts of rather flakey connections.