(First time reply using FairEmail on my phone, which I think solves the plain text issue but I can't see how to do inline quotes properly)
The issue with standard dedup tools is that I have a lot of files, some of them quite large, and a dedup tool won't take into account all the features of the file structure that save our from having to, in effect, check every file against every other.
Also, with what I know now, they'd have failed horribly at detecting files which are identical except for whitespace differences
My script evolved to take the first 20k of message data after the header, remove whitespace, hash the first 10k of that, and compare those hashes, but only against files in the same associated sub directory. This was quite a quick way to find the things I need to find (where a message had failed to transfer from one server to another) but highly unlikely to be much use in a general case.
Incidentally, Google have now said that legacy Workspace accounts which are used for private (non commercial) purposes only can be retained, so the exercise has proved to be unnecessary after all the effort, but such is life!
23 May 2022 08:47:11 Peter peter.northerly@gmail.com:
A query, not a suggestion, because you are all far, far better informed than me about this stuff! I was curious and looked up what tools are available for this, and came on:
rdfind Fdupes dupGuru
which all seem to find dupes by content. I am sure Mark is well aware of these and more. So why can't one of them be used to find the dupes? Also, is any of them significantly better than the others?
Different but related issue, the app I have used to sync folders in the past is Unison, which I found very simple and effective. Also Grsync. Do people use these? Or is something else preferable?
Peter
On Sun, 22 May 2022 12:00:03 +0100 main-request@lists.alug.org.uk wrote:
Send main mailing list submissions to main@lists.alug.org.uk
To subscribe or unsubscribe via email, send a message with subject or body 'help' to main-request@lists.alug.org.uk
You can reach the person managing the list at main-owner@lists.alug.org.uk
When replying, please edit your Subject line so it is more specific than "Re: Contents of main digest..."
To unsubscribe send an email to main-leave@lists.alug.org.uk http://www.alug.org.uk/ Unsubscribe? See message headers or the web site above!