An Azure search service with built-in artificial intelligence capabilities that enrich information to help identify and explore relevant content at scale.
Hi @Thitiwut Harnphatcharapharnukorn ,
Thank you for reaching us regarding the ways to implement contains-like search logic and achieve proper alphabetical ordering of your results in Azure Cognitive Search.
Your approach with prefix searches is effective, and implementing true infix or contains matching (such as searching for "pli" to find "application," "supplier," or "multiplier") along with case-insensitive alphabetical sorting is a common requirement. Below is the recommended method for achieving both functionalities using Azure AI Search (formerly known as Azure Cognitive Search).
For infix/contains-like search :
The recommended approach for performant substring matching anywhere in the term is a custom analyzer that uses the NGramTokenFilterV2. This pre-generates overlapping substrings (n-grams) at index time, so a simple search=pli query matches naturally — no wildcards or regex needed at query time.
This beats wildcard (pli) or regex (/.pli./) queries in speed, especially on larger indexes, though it does increase index size a bit (tune minGram/maxGram to balance recall and storge 3 to 8 or 10 works well for most cases).
Example index field and analyzer config (add this when creating/updating your index):
{
"fields": [
{
"name": "name",
"type": "Edm.String",
"searchable": true,
"analyzer": "ngram_analyzer",
"sortable": true,
"normalizer": "lowercase" // for case-insensitive sort — see below
}
],
"analyzers": [
{
"name": "ngram_analyzer",
"@odata.type": "#Microsoft.Azure.Search.CustomAnalyzer",
"tokenizer": "keyword_v2", // keeps the full string as one token
"tokenFilters": [ "lowercase", "ngram_filter" ]
}
],
"tokenFilters": [
{
"name": "ngram_filter",
"@odata.type": "#Microsoft.Azure.Search.NGramTokenFilterV2",
"minGram": 3,
"maxGram": 8
}
]
}
If you prioritize simplicity over index size and are able to accommodate slightly slower query performance, particularly when using leading wildcards, consider utilizing a keyword analyzer field along with Lucene's full syntax options:
- search=pli&queryType=full (wildcard)
- search=/.pli./&queryType=full (regex)
For case-insensitive alphabetical sorting:
By default, string sorting is case-sensitive and follows ASCII order, placing uppercase letters before lowercase ones. To address this, please include "normalizer": "lowercase" in your field configuration, along with "sortable": true. This ensures values are preprocessed for sorting, filtering, and faceting, while search tokenization remains unaffected.
The $orderby query then works naturally:
GET ... ?search=...&$orderby=name asc
Implementation Best Practices and Considerations:
- When modifying analyzers or normalizers, it is necessary to drop and recreate the index, or add new fields. Please plan accordingly for reindexing.
- Use the Analyze API to test tokenization and verify expected behavior.
- Implementing n-gram can increase index size; begin with conservative gram ranges and monitor storage and performance impacts.
- You can apply both features to the same field, use an analyzer for search and a normalizer for sorting as they function independently.
Reference:
https://learn.microsoft.com/en-us/azure/search/search-query-partial-matching
https://learn.microsoft.com/en-us/azure/search/index-add-custom-analyzers
https://learn.microsoft.com/en-us/azure/search/search-normalizers
Kindly let us know if the above helps or you need further assistance on this issue.
Please "accept" if the information helped you. This will help us and others in the community as well.