Azure Search: Filtering terms with spaces, dashes and multiple terms

Bernard Brown 75 Reputation points
2024-02-19T21:20:08.93+00:00

I'm using the SDK for Azure Search and I have a requirement to filter terms that can contain spaces, dashes and multiple terms. For example, I have to filter based on last name. So, I need to filter names like Van Noy, Jackson-Davis, Jackson - Davis, and Jackson, Davis, Johnson, etc. Of course it should be able to handle any combination of these. I started using search.in() and that handled the multiple terms (Jackson, Davis, Johnson) just fine as long as there were no spaces in the term. I tried search.ismatch() and that handled terms with spaces, but not my need to handle the multiple terms. And neither one handled dashes. Is there a way to handle this via the SDK?

Azure AI Search
Azure AI Search
An Azure search service with built-in artificial intelligence capabilities that enrich information to help identify and explore relevant content at scale.
0 comments No comments
{count} votes

Accepted answer
  1. Grmacjon-MSFT 19,466 Reputation points Moderator
    2024-02-20T05:18:57.0233333+00:00

    Hi @Bernard Brown
    The short answers is yes, it can be handled using the SDK. This can be done by using the search.ismatch function in the Azure Search SDK and by properly formatting your query string to handle spaces, dashes, and multiple terms.

    Here's how you can approach this:

    1. Prepare the Query String: Before passing the query string to the search.ismatch function, you need to format it to handle spaces and dashes. You can replace spaces with a space character preceded by a backslash ( ) and replace dashes with the same pattern (\ -). This ensures that the spaces and dashes are treated as part of the term rather than separators.
    query_string = "Van Noy, Jackson-Davis, Jackson - Davis, Jackson Davis Johnson"
    formatted_query = query_string.replace(" ", "\ ").replace("-", "\ -")
    
    1. Use search.ismatch: Use the search.ismatch function and pass the formatted query string as the pattern. The search.ismatch function performs a regular expression match on the field, allowing you to handle multiple terms in a single query.
    from azure.search.documents import SearchClient
    search_client = SearchClient(...)
    results = search_client.search(
        search_text="",
        filter="search.ismatch('lastName', '{0}')".format(formatted_query)
    )
    

    With this approach, the search.ismatch function will match any document where the lastName field contains any of the terms in the formatted query string, including terms with spaces, dashes, and multiple terms. It's important to note that the search.ismatch function uses regular expression matching, which can be less efficient than the [search.in](http://search.in/) function for large sets of terms. If you have a large number of terms to match, you might want to consider splitting the query into multiple smaller queries or using an alternative approach, such as creating a custom analyzer that can handle these types of terms. If you have further questions, please let us know. -Grace


0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.