Analyze Text (Azure AI Search REST API)

The Analyze API shows how an analyzer breaks text into tokens. It's intended for interactive testing so that you can see how a given analyzer will tokenize a string input.

POST https://[service name].search.windows.net/indexes/[index name]/analyze?api-version=[api-version]
    Content-Type: application/json
    api-key: [admin key]

To specify an analyzer used during indexing and query execution, set the analyzer property on string fields in the index.

URI Parameters

Parameter Description
service name Required. Set this to the unique, user-defined name of your search service.
index name Required. The request URI specifies the name of the index that contains the field you want to analyze.
api-version Required. The current stable version is api-version=2020-06-30. See API versions for more versions.

Request Headers

The following table describes the required and optional request headers.

Fields Description
Content-Type Required. Set this to application/json
api-key Optional if you're using Azure roles and a bearer token is provided on the request, otherwise a key is required. An api-key is a unique, system-generated string that authenticates the request to your search service. Analyzer requests must include an api-key header set to your admin key (as opposed to a query key). See Connect to Azure AI Search using key authentication for details.

Request Body

{
  "text": "Text to analyze",
  "analyzer": "analyzer_name"
}

or

{
  "text": "Text to analyze",
  "tokenizer": "tokenizer_name",
  "tokenFilters": (optional) [ "token_filter_name" ],
  "charFilters": (optional) [ "char_filter_name" ]
}

The analyzer_name, tokenizer_name, token_filter_name and char_filter_name need to be valid names of predefined or custom analyzers, tokenizers, token filters, and char filters for the index. To learn more about the process of lexical analysis, see Analysis in Azure AI Search.

Response

Status Code: 200 OK is returned for a successful response.

The response body is in the following format:

    {
      "tokens": [
        {
          "token": string (token),
          "startOffset": number (index of the first character of the token),
          "endOffset": number (index of the last character of the token),
          "position": number (position of the token in the input text)
        },
        ...
      ]
    }

Examples

Request body includes the string and analyzer you want to use.

     {
       "text": "The quick brown fox",
       "analyzer": "standard"
     }

The response shows the tokens emitted by the analyzer for the string you provide.

{
    "tokens": [
        {
            "token": "the",
            "startOffset": 0,
            "endOffset": 3,
            "position": 0
        },
        {
            "token": "quick",
            "startOffset": 4,
            "endOffset": 9,
            "position": 1
        },
        {
            "token": "brown",
            "startOffset": 10,
            "endOffset": 15,
            "position": 2
        },
        {
            "token": "fox",
            "startOffset": 16,
            "endOffset": 19,
            "position": 3
        }
    ]
}

See also