Re: [ALUG] Wanted - algorithm for bufferless search for a pattern in a stream

7 Aug 2012


      On 07-Aug-2012 12:25:02 Mark Rogers wrote:
...
On 07/08/12 12:28, (Ted Harding) wrote:
...
Well, that certainly simplifies matters! However, on the general question 
(and assuming that you're satisfied with just picking up the first match, 
i.e. knowing that there's at least one match), then the only fool-proof 
general method I can think of is the following. It presumes that you have 
enough RAM for two copies of the target string, plus whatever is needed for 
the program etc.
Obviously the target has to be stored somewhere, but this needn't be in RAM 
(I'm assuming that there's "plenty" of ROM space from the way Chris described
the problem).
If so, you start with an integer index into the target set to zero, and when 
you find a character in the stream that matches the character pointed to by 
the index you increment the index. When you get to the end of the target 
string you have you match.
The trick is how to replay things when you get a miss midway into the target,
but that's not actually that hard; you loop through the data from the second 
character in the target to the current (failed) position replaying them as if
they came from the stream, retesting them as you go.
From a quick play the hardest bit seems to be coming up with test cases that
pick up the edge cases. I can't find any particularly "difficult" examples
but 
also can't prove they aren't out there.
-- 
Mark Rogers // More Solutions Ltd (Peterborough Office) // 0844 251 1450
Registered in England (0456 0902) 21 Drakes Mews, Milton Keynes, MK8 0ER
The sort of "edge case" I was thinking of when deciding against the
"obvious" method (based on your moving pointer algorithm) is:
Target:   ABCDABCDEFH
Input stream: ... ABCDABCDABCDEFHXYZ...
(so that the target is at ABCD|AABCDABCDEFH|XYZ...
It will happily get to the "^" point below:
ABCDABCD^ABCDEFHXYZ...
because the target is matched up to that point. But the next
byte is a mismatch, so then it should move back to byte 5 of the
above input stream and continue matching from that point
(using the accumulated matching bytes from the previous input bytes)
so that it knows it has matched
ABCDABCD
and only needs to watch out for EFHXYZ (which is about to arrive).
This is already beginning to get complicated, and one can think
of examples where you may need a "stack" of pending partial matches.
At this point, you are beginning to need more RAM, and a more
complicated algorithm, which is why I thought that the FIFO + MATCH
procedure that I suggested would be (a) fool-proof, (b) reasonably
economical.
Ted.
-------------------------------------------------
E-Mail: (Ted Harding) Ted.Harding@wlandres.net
Date: 07-Aug-2012  Time: 13:51:00
This message was sent by XFMail
-------------------------------------------------

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

1997

Re: [ALUG] Wanted - algorithm for bufferless search for a pattern in a stream