Replacing duplicate files with hard links

List overview All Threads
Download

newer

older

touch screen packages

Any ADSL experts here?

Mark Rogers

24 Nov 2009 24 Nov '09

5:09 p.m.

Does anyone make use of any tools which can search for duplicate files and (assuming they're on the same filesystem) replace the duplicates with hard links to the first copy of the file?

There seem to be a few scripts around but since it'll be messing with my files I'd rather go by recommendation if I can. (The plan is to free up some space on a server which has lots of duplicate images.)

Also, any "gotchas" that should put me off trying this would be appreciated.

-- Mark Rogers // More Solutions Ltd (Peterborough Office) // 0844 251 1450 Registered in England (0456 0902) @ 13 Clarke Rd, Milton Keynes, MK1 1LG

Show replies by date

Steve Fosdick

25 Nov 25 Nov

1:16 a.m.

On Tue, 2009-11-24 at 17:09 +0000, Mark Rogers wrote:

...

Does anyone make use of any tools which can search for duplicate files and (assuming they're on the same filesystem) replace the duplicates with hard links to the first copy of the file?

There seem to be a few scripts around but since it'll be messing with my files I'd rather go by recommendation if I can. (The plan is to free up some space on a server which has lots of duplicate images.)

Also, any "gotchas" that should put me off trying this would be appreciated.

The 'gotcha' is that hard-linked files are not copy-on-write, i.e. if a program opens the file under one of its names and re-writes the contents (or appends to the file) the change happens to the single underlying file under all of its names. As long as that is what you are expecting to happen then all is well.

When I am happy this will not cause a problem I have been using a home-grown program which I tried attaching and then this message got held for approval so instead, if you are interested, it is available at: http://pelvoux.gotadsl.co.uk/dupfind.c

Regards, Steve.

Wayne Stallwood

1:47 a.m.

Mark Rogers wrote:

...

Does anyone make use of any tools which can search for duplicate files and (assuming they're on the same filesystem) replace the duplicates with hard links to the first copy of the file?

There seem to be a few scripts around but since it'll be messing with my files I'd rather go by recommendation if I can. (The plan is to free up some space on a server which has lots of duplicate images.)

Also, any "gotchas" that should put me off trying this would be appreciated.

fdupes will do the "find the duplicate files" bit, it doesn't even care if the filenames are different as it checksums the contents. Better than a straight search on a filename as you may have multiple versions with different contents.

I know fdupes has a mode where it can delete duplicates...but I'd imagine it is a fairly trivial exercise in scripting to get it to produce links to a master instead.

Although take heed of Steve's comment, if duplicates exist you have to ask yourself why...perhaps if we are talking on a fileserver it is because someone wanted a local copy they can edit without impacting everyone else...start replacing these for hard links and you are going to upset people.

Mark Rogers

2:47 p.m.

Wayne Stallwood wrote:

...

Although take heed of Steve's comment, if duplicates exist you have to ask yourself why...perhaps if we are talking on a fileserver it is because someone wanted a local copy they can edit without impacting everyone else...start replacing these for hard links and you are going to upset people.

Steve's comment was very relevant, I did know that but had largely forgotten it so part of my desire for using it is shot to pieces!

That said, it's not uncommon around these parts for someone to duplicate a website to do work on it, where a large volume of product images are included that are not edited and that directory can represent 90+% of the total size of the copy. There are better alternatives to doing the copy of-course, and I think maybe I should look at that before I start linking copies of images.

-- Mark Rogers // More Solutions Ltd (Peterborough Office) // 0844 251 1450 Registered in England (0456 0902) @ 13 Clarke Rd, Milton Keynes, MK1 1LG

5708

Age (days ago)

5709

Last active (days ago)

main@lists.alug.org.uk

3 comments

3 participants

tags (0)

participants (3)

Mark Rogers
Steve Fosdick
Wayne Stallwood