Mark Rogers wrote:
Does anyone make use of any tools which can search for duplicate files and (assuming they're on the same filesystem) replace the duplicates with hard links to the first copy of the file?
There seem to be a few scripts around but since it'll be messing with my files I'd rather go by recommendation if I can. (The plan is to free up some space on a server which has lots of duplicate images.)
Also, any "gotchas" that should put me off trying this would be appreciated.
fdupes will do the "find the duplicate files" bit, it doesn't even care if the filenames are different as it checksums the contents. Better than a straight search on a filename as you may have multiple versions with different contents.
I know fdupes has a mode where it can delete duplicates...but I'd imagine it is a fairly trivial exercise in scripting to get it to produce links to a master instead.
Although take heed of Steve's comment, if duplicates exist you have to ask yourself why...perhaps if we are talking on a fileserver it is because someone wanted a local copy they can edit without impacting everyone else...start replacing these for hard links and you are going to upset people.