Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
With the previous two modules in place we are now set up to use a DFA to match against a string. In my implementation I support either a greedy match or an short match. In a full featured regular expression engine this ability to choose greedy or not would be per operator but for simplicity I have it for the overall match.
To do the matching I have a general function which will create a list of all matches. Then the difference between short and greedy matching is which of the candidate solutions does it choose.
This is the method:
1: doMatch func machine st [] = doAccept machine st []
2: doMatch func machine st string = func $ map (\f -> doMatch' st f []) (tails string)
3: where
4: doMatch' state [] soFar = doAccept machine st soFar
5: doMatch' state (s:str) soFar =
6: case findTransition machine s state of
7: Nothing -> doAccept machine state soFar
8: Just (from, to, val) -> case doMatch' to str (soFar ++ [s]) of
9: (False,_) -> case canAccept machine to of
10: True -> (True, soFar ++ [s])
11: False -> doMatch' to str (soFar ++ [s])
12: (True,res) -> (True,res)
This creates the list of matches and uses the passed in function to determine how to filter to either the shortest or longest match.
For short or long matches I pass in one of these two functions:
1: -- Get the shortest match
2: shortest matches = case filter (\s->fst s) (sort matches) of
3: [] -> (False,"")
4: ms -> head ms
5:
6: -- Get the longest match
7: longest matches = last.sort $ matches
I created aliases for the functions to make it more handy:
1: (=~) = greedyMatch
2: (=~?) = shortMatch
And then the final result:
1: *SimpleRegex> "hiphiphiphorray" =~? "hip(hip)*"
2: (True,"hip")
3:
4: *SimpleRegex> "hiphiphiphorray" =~ "hip(hip)*"
5: (True,"hiphiphip")
I attached a zip of all the files for this project.
Enjoy!
Comments
- Anonymous
June 21, 2008
The third module in the simple regular expression parser is called: NFAtoDFA. Which as you might have