Azure AI Search with Custom Tokenizer from Nori

Lyncheese 105 Reputation points
2024-02-15T00:53:55.59+00:00

Is it possible to integrate another tokenizer to Azure AI Search custom analyzer ?

There is a 3rd party tokenizer which I found might be useful for my solution. While the tokenizer provided by Azure is not working as expected for Korean language.

I plan to use this https://esbook.kimjmin.net/06-text-analysis/6.7-stemming/6.7.2-nori

It is supported for Elastic Search.

Is there a possible way which I can use to integrate it and use it as the tokenizer in Azure AI Search ?

Azure AI Search
Azure AI Search
An Azure search service with built-in artificial intelligence capabilities that enrich information to help identify and explore relevant content at scale.
1,340 questions
0 comments No comments
{count} votes

Accepted answer
  1. brtrach-MSFT 17,731 Reputation points Microsoft Employee Moderator
    2024-02-16T00:46:26.8766667+00:00

    @Lyncheese Yes, it is possible to integrate a custom tokenizer to Azure AI Search custom analyzer. However, it requires some development work. To use the Nori tokenizer in Azure AI Search, you need to create a custom tokenizer class that implements the Microsoft.Azure.Search.Models.ITokenizer interface. You can then use this custom tokenizer in your custom analyzer. Here is an example of how to create a custom tokenizer class for Nori:

    using Microsoft.Azure.Search.Models;
    using Newtonsoft.Json.Linq;
    using System.Collections.Generic;
    using System.IO;
    using System.Linq;
    using System.Text;
    
    public class NoriTokenizer : ITokenizer
    {
        public string Name => "nori_tokenizer";
    
        public TokenizerV2Type Type => TokenizerV2Type.NoriTokenizer;
    
        public IDictionary<string, object> Tokenize(string text)
        {
            var tokens = new List<string>();
    
            // Call Nori tokenizer here and add the tokens to the list
    
            return new Dictionary<string, object>
            {
                { "tokens", tokens }
            };
        }
    }
    
    
    
    

    Once you have created the custom tokenizer class, you can use it in your custom analyzer like this:

    var customAnalyzer = new CustomAnalyzer
    {
        Name = "my_custom_analyzer",
        Tokenizer = new NoriTokenizer(),
        TokenFilters = new List<TokenFilterName> { TokenFilterName.Lowercase }
    };
    
    
    

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.