On 07-Aug-2012 12:25:02 Mark Rogers wrote:
On 07/08/12 12:28, (Ted Harding) wrote:
Well, that certainly simplifies matters! However, on the general question (and assuming that you're satisfied with just picking up the first match, i.e. knowing that there's at least one match), then the only fool-proof general method I can think of is the following. It presumes that you have enough RAM for two copies of the target string, plus whatever is needed for the program etc.
Obviously the target has to be stored somewhere, but this needn't be in RAM (I'm assuming that there's "plenty" of ROM space from the way Chris described the problem).
If so, you start with an integer index into the target set to zero, and when you find a character in the stream that matches the character pointed to by the index you increment the index. When you get to the end of the target string you have you match.
The trick is how to replay things when you get a miss midway into the target, but that's not actually that hard; you loop through the data from the second character in the target to the current (failed) position replaying them as if they came from the stream, retesting them as you go.
From a quick play the hardest bit seems to be coming up with test cases that pick up the edge cases. I can't find any particularly "difficult" examples but also can't prove they aren't out there.
-- Mark Rogers // More Solutions Ltd (Peterborough Office) // 0844 251 1450 Registered in England (0456 0902) 21 Drakes Mews, Milton Keynes, MK8 0ER
The sort of "edge case" I was thinking of when deciding against the "obvious" method (based on your moving pointer algorithm) is:
Target: ABCDABCDEFH
Input stream: ... ABCDABCDABCDEFHXYZ... (so that the target is at ABCD|AABCDABCDEFH|XYZ...
It will happily get to the "^" point below:
ABCDABCD^ABCDEFHXYZ...
because the target is matched up to that point. But the next byte is a mismatch, so then it should move back to byte 5 of the above input stream and continue matching from that point (using the accumulated matching bytes from the previous input bytes) so that it knows it has matched
ABCDABCD
and only needs to watch out for EFHXYZ (which is about to arrive).
This is already beginning to get complicated, and one can think of examples where you may need a "stack" of pending partial matches. At this point, you are beginning to need more RAM, and a more complicated algorithm, which is why I thought that the FIFO + MATCH procedure that I suggested would be (a) fool-proof, (b) reasonably economical.
Ted.
------------------------------------------------- E-Mail: (Ted Harding) Ted.Harding@wlandres.net Date: 07-Aug-2012 Time: 13:51:00 This message was sent by XFMail -------------------------------------------------