Azure AI Search: Best way to implement several languages in one search index.

Vilhelm Heiberg 11 Reputation points
2024-05-07T12:37:43.55+00:00

I have items with title and description (and some other fields).
They have title and descriptions in 5 languages.
The user will search with one of those languages.
Should I make one index for each language or make one index with 5 languages?
I was thinking to use complex fields.
One field for "title" which has 5 subfields, one for each language, and add an analyzer to each subfield.
And similar for descriptions.
I will also have synonym maps for each language.
How to do this the best way in Azure AI Search?

An extra caveat is that the user may write words in another language than expected.
For instance, users often use English words even though they are operating in Norwegian.

Azure AI Search
Azure AI Search
An Azure search service with built-in artificial intelligence capabilities that enrich information to help identify and explore relevant content at scale.
1,069 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Grmacjon-MSFT 18,456 Reputation points
    2024-05-07T22:58:38.9866667+00:00

    Hello @Vilhelm Heiberg you're correct, there are several approaches to handle your multilingual search scenario with Azure AI Search. Here's a breakdown of the three options you mentioned, along with their pros, cons, and which might be the best fit for your situation:

    1. One Index per Language:

    Pros:

    1. Simplest setup - easy to understand and manage.
    2. Potentially faster searches within each language due to language-specific optimizations.
    3. Easier to implement language-specific ranking algorithms.

    Cons:

    1. Requires managing and maintaining multiple indexes.
    2. Data duplication across indexes increases storage requirements.
    3. Routing user queries to the correct language index can add complexity.

    2. One Index with Flat Fields (All Languages):

    Pros:

    1. Simpler index management compared to multiple language indexes.
    2. Lower storage requirements as data isn't duplicated.

    Cons:

    1. Search performance might be less optimal compared to language-specific analyzers.
    2. Requires custom logic or filtering to handle searches specific to a particular language.
    3. Might be challenging to implement language-specific ranking.

    3. One Index with Complex Fields (Subfields per Language):

    Pros:

    1. Good balance between manageability and performance.
    2. Single index for all data simplifies management.
    3. Leverage language-specific analyzers for each subfield for optimal search.
    4. Supports filtering and faceting based on language-specific fields.

    Cons:

    1. More complex index structure compared to flat fields.
    2. Requires careful configuration of analyzers and synonym maps for each language.

    Overall, if you prioritize both search performance and manageability then I would suggest option 3 (One Index with Complex Fields (Subfields per Language). This approach offers a good balance and allows for language-specific optimizations.

    Also, If you have a limited number of languages (2-3), managing separate indexes might be feasible. However, as the number of languages grows, the complexity of managing multiple indexes increases, making option 3 (complex fields) the better option.

    Hope that helps.

    -Grace

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.