On 06/08/12 19:25, Chris Green wrote:
So, obviously one reads a byte at a time from the stream to see if it matches the first character of the constant string, if that matches one moves on to the second, etc. When the whole string has been matched then it's found - so far so good. However, when a character *doesn't* match one can't go back up the input stream, *but* we know the previous characters as they're the ones in the constant string that we have matched so far. Thus we can 'wind back' by using the constant string for the number of characters that did match and then continue down the stream.
That's one "obvious" method, but not the only one.
Another would be that you run multiple pattern matches (ie even though you're already up to N characters in a match, you're still checking for the start of a new match simultaneously, so that if/when the first pattern fails you don't rewind but just drop it and carry on with any other sequences you're still matching).
There may be others....
Bear in mind that if you're looking for "ABAB" in the stream "xxABABABxx" there are two matches but your algorithm will only pick up the first one, I don't know if that matters to you.
However, since grep can search within a stream surely that would be a good starting point?
cat /my/stream | grep 'really long pattern'
You can open a process and feed data to it byte-by-byte if you want to from Any Good Programming Language [TM] - eg for PHP[*]: http://www.php.net/manual/en/function.proc-open.php
[*] Not necessarily a Good Programming Language but it's what I'm used to and ISTR Chris uses it too.