Hi there,
I have got another question to build a customize phone number analyzer.
For instance, +61 2 8364 5809 will be found when user searches:
- 61 2 8364 5809
- 61283645809
- 8364 5809
- 83645809
- 8364
- 836
- 5809
Not found if user searches
- 809
I have PatternCaptureTokenFilter (PreserveOriginal = true) to clean up "+", "(", ")" and space.
var phoneFilter = new PatternCaptureTokenFilter("phone_filter", new string[] { "([^()\\+\\s]+)" });
phoneFilter.PreserveOriginal = true;
tokenFilterList.Add(phoneFilter);
var phoneCleanupFilter = new PatternReplaceTokenFilter("phone_cleanup_filter", "\\W+", string.Empty);
tokenFilterList.Add(phoneCleanupFilter);
This analyzer can fulfill all the requirements except #4, but as soon as I implemented EdgeNGramTokenFilter after phoneFilter and phoneCleanupFilter to get the right 8 to 10 digits, all the tokens generated above that are less than 8 will be removed.
var eightEdgeGramsFilter = new EdgeNGramTokenFilter("8_10_edgegrams");
eightEdgeGramsFilter.MinGram = 8;
eightEdgeGramsFilter.MaxGram = 10;
eightEdgeGramsFilter.Side = EdgeNGramTokenFilterSide.Back;
tokenFilterList.Add(eightEdgeGramsFilter);
Is there a way to PreserveOriginal in EdgeNGramTokenFilter? Or is there a better way to get the right 8, 10 digits?
Regards,
Jay