Analyze Text (Preview REST API)

Article
06/27/2024

Applies to: 2023-07-01-Preview. This version is no longer supported. Upgrade immediately to a newer version.

Important

2023-07-01-Preview (no changes).

2021-04-30-Preview adds "normalizer", used for testing case-insensitivity and text processing on filters and sorts.

The Analyze Text API shows how an analyzer breaks text into tokens, and how a normalizer preprocesses text. It's intended for interactive testing so that you can evaluate a given analyzer or normalizer for debugging purposes.

POST https://[service name].search.windows.net/indexes/[index name]/analyze?api-version=[api-version]
    Content-Type: application/json
    api-key: [admin key]

Testing an analyzer or normalizer is a standalone task. If you're using an analyzer or normalizer during indexing or query execution, you'll specify it in Create or Update Index on individual fields.

URI parameters

Parameter	Description
service name	Required. The name of your search service.
index name	Required. The name of the index containing the field you want to analyze.
api-version	Required. See API versions for the full list.

Request headers

The following table describes the required and optional request headers.

Fields	Description
Content-Type	Required. Set this to `application/json`
api-key	Optional if you're using Azure roles and a bearer token is provided on the request, otherwise a key is required. An api-key is a unique, system-generated string that authenticates the request to your search service. Analyzer requests must include an `api-key` header set to your admin key (as opposed to a query key). See Connect to Azure AI Search using key authentication for details.

Request body

{
  "text": "Text to analyze",
  "analyzer": "analyzer_name"
}

{
  "text": "Text to analyze",
  "tokenizer": "tokenizer_name",
  "tokenFilters": (optional) [ "token_filter_name" ],
  "charFilters": (optional) [ "char_filter_name" ]
}

{
  "text": "Text to normalize",
  "normalizer": "normalizer_name"
}

Request contains the following properties:

Property	Description
text	Required. The text to be analyzed or normalized.
analyzer	The analyzer used to break the text into tokens. This property is the name of a built-in analyzer, the name of a language analyzer, or the name of custom analyzer in the index definition. To learn more about the process of lexical analysis, see Analysis in Azure AI Search.
tokenizer	The tokenizer used to break the text into tokens. This property is the name of a predefined tokenizer or the name of a custom tokenizer in the index definition.
tokenFilters	A collection of token filters used to process the text. The values of the collection need to be the names of predefined token filters or the names of custom token filters in the index definition. For testing analyzers, this property must be used alongside the tokenizer property. For testing normalizers, this property can be used independently.
charFilters	A collection of character filters used to process the text. The values of the collection need to be the names of predefined character filters or the names of custom character filters in the index definition. For testing analyzers, this property must be used alongside the tokenizer property. For testing normalizers, this property can be used independently.
normalizer	The normalizer used to process the text. This property is the name of a predefined normalizer or the name of custom normalizer in the index definition. To learn more about normalizers, see Text normalization for filtering, faceting, and sorting.

Response

Status Code: 200 OK is returned for a successful response.

The response body is in the following format:

    {
      "tokens": [
        {
          "token": string (token),
          "startOffset": number (index of the first character of the token),
          "endOffset": number (index of the last character of the token),
          "position": number (position of the token in the input text)
        },
        ...
      ]
    }

Examples

Request body includes the string and analyzer or normalizer you want to use.

     {
       "text": "The quick brown fox",
       "analyzer": "standard"
     }

The response shows the tokens emitted by the analyzer for the string you provide.

{
    "tokens": [
        {
            "token": "the",
            "startOffset": 0,
            "endOffset": 3,
            "position": 0
        },
        {
            "token": "quick",
            "startOffset": 4,
            "endOffset": 9,
            "position": 1
        },
        {
            "token": "brown",
            "startOffset": 10,
            "endOffset": 15,
            "position": 2
        },
        {
            "token": "fox",
            "startOffset": 16,
            "endOffset": 19,
            "position": 3
        }
    ]
}

Share via