I'm looking for tools (if there are any) for processing a text file line by line sequentially.
As it goes through the file it needs to make decisions based on the contents of the line(s) of text and change its state as it goes. The decisions it makes depend on the state it's in.
Basically I'm processing some (fairly) fixed format messages from a forum to remove some matched header and trailer lines, modify and output a few other matched lines and simply output the body of the message.
The (most) difficult bit is removing blank lines before something.
E.g. we have a message that starts:-
A new topic has been created on the forum
Message Subject : weed webinar 31 January
Category : Waterways Continental Europe
Posted by : Fred Bloggs
I want to delete everything up to and including the blank line after 'Message Subject' then keep (i.e. output) the 'Category' line and the 'Posted by' lines without the blank lines in between.
I can't delete all blank lines because I want to retain spacing in the message body later. So I need to be able to do things like deleting blank lines unless I am in the message body.
Are there specific tools for doing this sort of thing or should I just write a program (probably in Python) that reads lines, does actions as required and remembers its state as it goes?
I got some of the way using sed but it's very difficult to 'delete the line before XXXX' with sed. It *might* be that awk would be better but I don't see it handling the state/sequential bit any better than sed.
Any/all advice would be very welcome.