Help!
I need to go through a text file, and for all lines similar to this: X-Mozilla-Status: 0009 .. I need to unset the 4th bit (so the above becomes 0001).
Not all X-Mozilla-Status: lines which have the bit set have status 0009, however. I have 0008, 000b, 0018, and so on.
Suggestions?
Background: Somehow I managed to delete a load of email in Thunderbird which I need. It ended up in Trash so I moved it back to my Inbox, but for some reason it didn't appear in my Inbox, but did get deleted out of Trash. However by "deleted" I mean it set the 4th bit of the status word, so until I compact the folder I can recover it by unsetting that bit.
I still need to work out why it didn't go back into my Inbox, I think I may have discovered a bug there.
On 23/09/10 11:00, Mark Rogers wrote:
I need to go through a text file, and for all lines similar to this: X-Mozilla-Status: 0009 .. I need to unset the 4th bit (so the above becomes 0001).
Just to follow-up on this:
I didn't find a good solution, but did find out that I could remove all the X-Mozilla-Status: lines and Thunderbird was happy with the resulting file, it just had everything marked unread and undeleted.
To remove the lines I used: grep -v 'X-Mozilla-Status:' mailbox > mailbox.tmp mv mailbox.tmp mailbox
Mark
On 30 Sep 17:39, Mark Rogers wrote:
On 23/09/10 11:00, Mark Rogers wrote:
I need to go through a text file, and for all lines similar to this: X-Mozilla-Status: 0009 .. I need to unset the 4th bit (so the above becomes 0001).
Just to follow-up on this:
I didn't find a good solution, but did find out that I could remove all the X-Mozilla-Status: lines and Thunderbird was happy with the resulting file, it just had everything marked unread and undeleted.
To remove the lines I used: grep -v 'X-Mozilla-Status:' mailbox > mailbox.tmp mv mailbox.tmp mailbox
Why couldn't you have just done: sed -i -e 's#^X-Mozilla-Status: 0009$#X-Mozilla-Status: 0001#;' mailbox
An inplace replacement of the line with the fixed line.
Cheers,
On 30-Sep-10 16:46:52, Brett Parker wrote:
On 30 Sep 17:39, Mark Rogers wrote:
On 23/09/10 11:00, Mark Rogers wrote:
I need to go through a text file, and for all lines similar to this: X-Mozilla-Status: 0009 .. I need to unset the 4th bit (so the above becomes 0001).
Just to follow-up on this:
I didn't find a good solution, but did find out that I could remove all the X-Mozilla-Status: lines and Thunderbird was happy with the resulting file, it just had everything marked unread and undeleted.
To remove the lines I used: grep -v 'X-Mozilla-Status:' mailbox > mailbox.tmp mv mailbox.tmp mailbox
Why couldn't you have just done: sed -i -e 's#^X-Mozilla-Status: 0009$#X-Mozilla-Status: 0001#;' mailbox
An inplace replacement of the line with the fixed line.
Cheers,
Brett Parker http://www.sommitrealweird.co.uk/
It seems Mark wants to set bit 4 to 0, presumably whatever the first 3 bits may be. Is that correct?
Also, Mark, how many of the bits would be 1 in this? Since bit 4 can be 1, and bits 1, 2, 3 could be 0 or 1, you could be up to 0015 on those 4 bits alone. Will you be using higher-order bits as well? E.g. with bit 5 in play, you can go up to 0031, with bit 6 up to 0063, etc.
Or is that 4-digit number really in hex? So it could have value 000F, 00BF, 0ABF, ...
In principle, your task can be dome in 'awk', but the details would depend on the answers to these questions!
Ted.
-------------------------------------------------------------------- E-Mail: (Ted Harding) ted.harding@wlandres.net Fax-to-email: +44 (0)870 094 0861 Date: 30-Sep-10 Time: 18:41:17 ------------------------------ XFMail ------------------------------
On 30/09/10 18:41, Ted Harding wrote:
It seems Mark wants to set bit 4 to 0, presumably whatever the first 3 bits may be. Is that correct?
That is correct.
The 0009 -> 0008 replacement is fine as far as it goes but it misses all the ones that have other bits set; deleting the status means that there's more rubbish to sort through but at least avoids missing anything.
Also, Mark, how many of the bits would be 1 in this? Since bit 4 can be 1, and bits 1, 2, 3 could be 0 or 1, you could be up to 0015 on those 4 bits alone. Will you be using higher-order bits as well? E.g. with bit 5 in play, you can go up to 0031, with bit 6 up to 0063, etc.
In principle any could be set, although I did grep | sort | uniq the X-Mozilla-Status line and (from memory) there were only about 12 variations in total.
Or is that 4-digit number really in hex? So it could have value 000F, 00BF, 0ABF, ...
Yes, it would be hex.
In principle, your task can be dome in 'awk', but the details would depend on the answers to these questions!
I'd be interested in how to do it in awk purely as an exercise to improve my awk skills, but the actual need has passed by deleting all the status lines. I do still have a backup file to test any theories on though!
On 01 Oct 08:55, Mark Rogers wrote:
On 30/09/10 18:41, Ted Harding wrote:
It seems Mark wants to set bit 4 to 0, presumably whatever the first 3 bits may be. Is that correct?
That is correct.
The 0009 -> 0008 replacement is fine as far as it goes but it misses all the ones that have other bits set; deleting the status means that there's more rubbish to sort through but at least avoids missing anything.
Also, Mark, how many of the bits would be 1 in this? Since bit 4 can be 1, and bits 1, 2, 3 could be 0 or 1, you could be up to 0015 on those 4 bits alone. Will you be using higher-order bits as well? E.g. with bit 5 in play, you can go up to 0031, with bit 6 up to 0063, etc.
In principle any could be set, although I did grep | sort | uniq the X-Mozilla-Status line and (from memory) there were only about 12 variations in total.
Or is that 4-digit number really in hex? So it could have value 000F, 00BF, 0ABF, ...
Yes, it would be hex.
In principle, your task can be dome in 'awk', but the details would depend on the answers to these questions!
I'd be interested in how to do it in awk purely as an exercise to improve my awk skills, but the actual need has passed by deleting all the status lines. I do still have a backup file to test any theories on though!
Was this an mbox format mailbox perchance? If so then the following would do the job...
--- Begin Python Script --- #!/usr/bin/python
import mailbox
# open and lock the mailbox... mb = mailbox.mbox("/path/to/mbox") mb.lock()
# iterate through the messages for (key,message) in mb.iteritems(): if message.has_key('X-Mozilla-Status'): old_status = message['X-Mozilla-Status'] try: # parse the hex status in to a value we can bitmask old_status_int = int(old_status, 16) except: print "Failed parsing %s" %(old_status) continue del message['X-Mozilla-Status'] # bit mask off the 4th bit. new_status_int = old_status_int & 0xfff7 # make it a 'hex' string without the leading 0x new_status = "%04x" %(new_status_int) message['X-Mozilla-Status'] = str(new_status) # write the new message mb.update(((key, message),)) print "Updated message index: %d" %(key)
mb.flush() mb.unlock() mb.close() --- End Python Script ---
All of the stuff there is in the default libraries of any recent python.
Cheers,
On 01-Oct-10 07:55:02, Mark Rogers wrote:
On 30/09/10 18:41, Ted Harding wrote:
It seems Mark wants to set bit 4 to 0, presumably whatever the first 3 bits may be. Is that correct?
That is correct.
The 0009 -> 0008 replacement is fine as far as it goes but it misses all the ones that have other bits set; deleting the status means that there's more rubbish to sort through but at least avoids missing anything.
Also, Mark, how many of the bits would be 1 in this? Since bit 4 can be 1, and bits 1, 2, 3 could be 0 or 1, you could be up to 0015 on those 4 bits alone. Will you be using higher-order bits as well? E.g. with bit 5 in play, you can go up to 0031, with bit 6 up to 0063, etc.
In principle any could be set, although I did grep | sort | uniq the X-Mozilla-Status line and (from memory) there were only about 12 variations in total.
Or is that 4-digit number really in hex? So it could have value 000F, 00BF, 0ABF, ...
Yes, it would be hex.
In principle, your task can be dome in 'awk', but the details would depend on the answers to these questions!
I'd be interested in how to do it in awk purely as an exercise to improve my awk skills, but the actual need has passed by deleting all the status lines. I do still have a backup file to test any theories on though!
-- Mark Rogers // More Solutions Ltd (Peterborough Office) // 0844 251 1450
The following shows the principle of the thing. There is no built-in function in 'awk' to handle hex arithmetic, do I've had to fudge it. Also, the following code only deals with the 4th hex digit in the final field and leaves positions 1,2,3 unchanged. It could of course be extended so as to perform required modifications of all of them!
Paste the following (between the "###..." lines) into the command line: ############################################################# cat << EOT | awk '/^X-Mozilla-Status:/{ temp1=substr($2,1,3) ; temp2=substr($2,4,1) if(temp2=="8"){temp3="0"} if(temp2=="9"){temp3="1"} if(temp2=="A"){temp3="2"} if(temp2=="B"){temp3="3"} if(temp2=="C"){temp3="4"} if(temp2=="D"){temp3="5"} if(temp2=="E"){temp3="6"} if(temp2=="F"){temp3="7"} $2=temp1 temp3 print $1 " " $2 next } {print $0}'
This is garbage X-Mozilla-Status: 0008 This is more garbage X-Mozilla-Status: 1329 And still more X-Mozilla-Status: 236A And more again X-Mozilla-Status: A15B X-Mozilla-Status: 2C4C X-Mozilla-Status: FE3D X-Mozilla-Status: BD2E X-Mozilla-Status: 231F And this is the last garbage EOT #############################################################
and you should see the following output:
This is garbage X-Mozilla-Status: 0000 This is more garbage X-Mozilla-Status: 1321 And still more X-Mozilla-Status: 2362 And more again X-Mozilla-Status: A153 X-Mozilla-Status: 2C44 X-Mozilla-Status: FE35 X-Mozilla-Status: BD26 X-Mozilla-Status: 2317 And this is the last garbage
Showing that the "garbage" lines (whatever they happen to be) come through unchanged, while the lines matching "^X-Mozilla-Status:" have their second field modified so that bit 4 of the 4th hex digit is set to 0.
Ted.
-------------------------------------------------------------------- E-Mail: (Ted Harding) ted.harding@wlandres.net Fax-to-email: +44 (0)870 094 0861 Date: 01-Oct-10 Time: 11:07:31 ------------------------------ XFMail ------------------------------