Personally I'd do it in either C or PHP even basic would do it. You like python so go for it!
It's been over 40 years since I used ready built tools to do that sort of thing so lex and yacc are probably not how I remember them.
Nev
On 20/01/2022 15:02, Chris Green wrote:
I'm looking for tools (if there are any) for processing a text file line by line sequentially.
As it goes through the file it needs to make decisions based on the contents of the line(s) of text and change its state as it goes. The decisions it makes depend on the state it's in.
Basically I'm processing some (fairly) fixed format messages from a forum to remove some matched header and trailer lines, modify and output a few other matched lines and simply output the body of the message.
The (most) difficult bit is removing blank lines before something.
E.g. we have a message that starts:-
A new topic has been created on the forum Message Subject : weed webinar 31 January Category : Waterways Continental Europe Posted by : Fred Bloggs
I want to delete everything up to and including the blank line after 'Message Subject' then keep (i.e. output) the 'Category' line and the 'Posted by' lines without the blank lines in between.
I can't delete all blank lines because I want to retain spacing in the message body later. So I need to be able to do things like deleting blank lines unless I am in the message body.
Are there specific tools for doing this sort of thing or should I just write a program (probably in Python) that reads lines, does actions as required and remembers its state as it goes?
I got some of the way using sed but it's very difficult to 'delete the line before XXXX' with sed. It *might* be that awk would be better but I don't see it handling the state/sequential bit any better than sed.
Any/all advice would be very welcome.