In Azure congnitive search, searching for search term with hyphen - does not yield correct search results

Anantha Subramanian 46 Reputation points
2022-02-14T15:31:27.313+00:00

I am using .NET SDK

When searching for a search term "Kerin-A" I am getting 0 search results even though there is a document in repository i.e. as "Siani Kerin-Ann"

Code as below:

query - Kerin-A*

var options = new SearchOptions
{
Filter = $"search.ismatch('{pgid}', 'pgbid')",
IncludeTotalCount = true,
Size = 10,
Skip = skip,
Select = { "Id", "SearchId", "Metadata", "SearchContent", "IsPGP" },
Facets = { "IsPGP" },
};
var searchResults = await client.SearchAsync<SearchResultDocument>(query, options);

I would like to know what should be the query to retrieve the results correctly?

Azure AI Search
Azure AI Search
An Azure search service with built-in artificial intelligence capabilities that enrich information to help identify and explore relevant content at scale.
1,358 questions
0 comments No comments
{count} votes

Accepted answer
  1. Grmacjon-MSFT 19,381 Reputation points Moderator
    2022-02-18T05:42:23.453+00:00

    Hi @Anantha Subramanian ,

    Thanks for bringing this to our attention. You should be able to search terms with hyphen in Azure cognitive search.

    Azure Cognitive Search scans for whole tokenized terms in the index and won't find a match on a partial term unless you include wildcard placeholder operators (* and ?) , or format the query as a regular expression which is what you did for your query

    query - Kerin-A*

    Are you using any specific analyzer for this search?

    Based on this Azure doc a possible solution is to "invoke an analyzer during indexing that preserves a complete string, including spaces and special characters if necessary, so that you can include the spaces and characters in your query string. Likewise, having a complete string that is not tokenized into smaller parts enables pattern matching for "starts with" or "ends with" queries, where the pattern you provide can be evaluated against a term that is not transformed by lexical analysis.

    Creating an additional field for an intact string, plus using a content-preserving analyzer that emits whole-term tokens, is the solution for both pattern matching and for matching on query strings that include special characters."

    Hope that helps

    -Grace


0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.