Sort indexes with lowercase, UPPERCASE, and Ñordic (Diacritics) letters

Yuriy Konradi (external) 0 Reputation points
2024-04-08T15:33:10.7866667+00:00

I need words with letters "Águas", "ARIMA", "aalen" to be sorted all together by letter A in indexes created in Azure by migration from another Data Base

indexes, lowercase, uppercase, diacritics

Azure AI Search
Azure AI Search
An Azure search service with built-in artificial intelligence capabilities that enrich information to help identify and explore relevant content at scale.
1,354 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. Grmacjon-MSFT 19,301 Reputation points Moderator
    2024-04-09T20:45:33.3333333+00:00

    Hi @Yuriy Konradi (external)

    There are two main approaches you can take to sort words like "Águas", "ARIMA", "aalen" together under the letter "A" in your Azure AI Search indexes, even though they have accented characters and mixed casing:

    1. Using Custom Tokenizers and Analyzers: This approach involves creating a custom tokenizer and analyzer in Azure Cognitive Services Text Analytics, which integrates with Azure AI Search. Here's the process:

    • Develop a custom tokenizer using Text Analytics that recognizes and treats these specific words ("Águas", "ARIMA", "aalen") as single tokens, regardless of accents or case. This tokenizer can handle any future words with similar patterns.
    • Create a custom analyzer in Text Analytics that incorporates your custom tokenizer and a language-specific normalization filter (e.g., standard_asciifolding for English). This filter removes accents and converts all characters to lowercase for consistent sorting.
    • When creating or updating your Azure AI Search index, specify the custom analyzer name in the analyzer_name property of the relevant field definition. This ensures the custom logic is applied during text processing and indexing.
    1. Utilizing Keyword Filters: This approach uses keyword filters within Azure AI Search to manipulate search results at query time. Here's how:
    • Keyword Filters: Define keyword filters in your Azure AI Search index for each of the specific words ("Águas", "ARIMA", "aalen"). These filters essentially act as synonyms, boosting the score of documents containing these words when a user searches for "A".
    • Sorting: During search, you can leverage the sort parameter in your search queries to specify sorting by the field containing these keywords. This will effectively group these terms under "A" in the search results.

    By implementing one of these approaches, you can ensure that words like "Águas", "ARIMA", and "aalen" are grouped together under the letter "A" in your Azure AI Search results, regardless of their original casing or accents.

    Hope that helps.

    -Grace

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.