Re: [ALUG] Wanted - algorithm for bufferless search for a pattern in a stream

7 Aug 2012


      On 06/08/12 19:25, Chris Green wrote:
...
So, obviously one reads a byte at a time from the stream to see if it
matches the first character of the constant string, if that matches one
moves on to the second, etc.  When the whole string has been matched
then it's found - so far so good.  However, when a character *doesn't*
match one can't go back up the input stream, *but* we know the previous
characters as they're the ones in the constant string that we have
matched so far.  Thus we can 'wind back' by using the constant string
for the number of characters that did match and then continue down the
stream.
That's one "obvious" method, but not the only one.
Another would be that you run multiple pattern matches (ie even though you're 
already up to N characters in a match, you're still checking for the start of 
a new match simultaneously, so that if/when the first pattern fails you don't 
rewind but just drop it and carry on with any other sequences you're still 
matching).
There may be others....
Bear in mind that if you're looking for "ABAB" in the stream "xxABABABxx" 
there are two matches but your algorithm will only pick up the first one, I 
don't know if that matters to you.
However, since grep can search within a stream surely that would be a good 
starting point?
cat /my/stream | grep 'really long pattern'
You can open a process and feed data to it byte-by-byte if you want to from 
Any Good Programming Language [TM] - eg for PHP[*]:
     http://www.php.net/manual/en/function.proc-open.php
[*] Not necessarily a Good Programming Language but it's what I'm used to and 
ISTR Chris uses it too.
-- 
Mark Rogers // More Solutions Ltd (Peterborough Office) // 0844 251 1450
Registered in England (0456 0902) 21 Drakes Mews, Milton Keynes, MK8 0ER

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

1997

Re: [ALUG] Wanted - algorithm for bufferless search for a pattern in a stream