I'm shortly aiming to merge a load of mail messages, they stored in various different mbox hierarchies. It should be pretty easy to write a little script to simply merge all the files in the same place in the hierachy together but then I'll need to remove duplicate messages as I know quite large chunks will be the same messages which have been stored in different places.
Are there any tools out there which will take a mbox file and remove duplicate messages? Removal on the basis of Message-Id: would be fine.
Chris Green asked:
Are there any tools out there which will take a mbox file and remove duplicate messages? Removal on the basis of Message-Id: would be fine.
formail (which I think comes with procmail or delivermail, depending on your distribution) can do that for you.
On Tue, Jun 07, 2005 at 02:34:35PM +0100, MJ Ray wrote:
Chris Green asked:
Are there any tools out there which will take a mbox file and remove duplicate messages? Removal on the basis of Message-Id: would be fine.
formail (which I think comes with procmail or delivermail, depending on your distribution) can do that for you.
Ah, thanks, I should have thought of that.
On Tue, Jun 07, 2005 at 01:10:01PM +0100, Chris Green wrote:
Are there any tools out there which will take a mbox file and remove duplicate messages? Removal on the basis of Message-Id: would be fine.
Yup, and you used it to send the original message. Mutt can do a select duplicates using tags and patterns.
See
http://www.mutt.org/doc/manual/manual-4.html#patterns and http://www.mutt.org/doc/manual/manual-4.html#ss4.3
for more info.
Adam
On Tue, Jun 07, 2005 at 03:51:07PM +0100, Adam Bower wrote:
On Tue, Jun 07, 2005 at 01:10:01PM +0100, Chris Green wrote:
Are there any tools out there which will take a mbox file and remove duplicate messages? Removal on the basis of Message-Id: would be fine.
Yup, and you used it to send the original message. Mutt can do a select duplicates using tags and patterns.
See
http://www.mutt.org/doc/manual/manual-4.html#patterns and http://www.mutt.org/doc/manual/manual-4.html#ss4.3
for more info.
... er, yes, and having tagged the duplicate messages how does one delete all but one of them, automatically by script?
I think formail is a better answer! :-)
On Tue, Jun 07, 2005 at 04:25:20PM +0100, Chris Green wrote:
... er, yes, and having tagged the duplicate messages how does one delete all but one of them, automatically by script?
I think formail is a better answer! :-)
Oh sorry reading your original mail I was under the impression you had lots of mbox files you wanted to turn into 1 *big* mbox file and then remove the duplicates.
To do this (the quick and dirty way) I would cat all the mbox files into 1 *big* mbox file and then in mutt do a
T ~=
(to select all the duplicates)
and then a
;d
to delete the duplicates.
I can't recall why I had to do this recently but I did, it was probably after I broke my mail system at some point :)
Adam
On Tue, Jun 07, 2005 at 05:06:22PM +0100, Adam Bower wrote:
On Tue, Jun 07, 2005 at 04:25:20PM +0100, Chris Green wrote:
... er, yes, and having tagged the duplicate messages how does one delete all but one of them, automatically by script?
I think formail is a better answer! :-)
Oh sorry reading your original mail I was under the impression you had lots of mbox files you wanted to turn into 1 *big* mbox file and then remove the duplicates.
Well I do, but I want one of the set of duplicates left behind.
To do this (the quick and dirty way) I would cat all the mbox files into 1 *big* mbox file and then in mutt do a
T ~=
(to select all the duplicates)
and then a
;d
to delete the duplicates.
.... but didn't you want one of the duplicates left behind?
On Tue, Jun 07, 2005 at 11:06:18PM +0100, Chris Green wrote:
to delete the duplicates.
.... but didn't you want one of the duplicates left behind?
That's what this does... I suppose I should have written it selects all the duplicates except 1 copy of the duplicate. Try it, then you will see what I mean :)
Adam
On Tue, Jun 07, 2005 at 11:28:01PM +0100, Adam Bower wrote:
On Tue, Jun 07, 2005 at 11:06:18PM +0100, Chris Green wrote:
to delete the duplicates.
.... but didn't you want one of the duplicates left behind?
That's what this does... I suppose I should have written it selects all the duplicates except 1 copy of the duplicate. Try it, then you will see what I mean :)
Aha! I didn't realise that, it could well be then that mutt will be quite useful in doing any final clearing up. I cold even be clever and use it in a script, I've done that before for Maildir <--> Mbox copying scripts. I call mutt with a few commands 'push'[ed]. It means that the script can do the file handling type bits and mutt can do the mailbox merging etc.