Edit

Share via


Create a custom analyzer via REST APIs

Content Understanding analyzers define how to process and extract insights from your content. They ensure uniform processing and output structure across all your content to deliver reliable and predictable results. We offer prebuilt analyzers for common use cases. This guide shows how these analyzers can be customized to better fit your needs.

In this guide, we use the cURL command line tool. If it isn't installed, you can download the appropriate version for your dev environment.

Define an analyzer schema

To create a custom analyzer, define a field schema that describes the structured data you want to extract. In the following example, we create an analyzer based on prebuilt document analyzer for processing a receipt.

Create a JSON file named request_body.json with the following content:

{
  "description": "Sample receipt analyzer",
  "baseAnalyzerId": "prebuilt-documentAnalyzer",
  "config": {
    "returnDetails": true,
    "enableFormula": false,
    "disableContentFiltering": false,
    "estimateFieldSourceAndConfidence": true,
    "tableFormat": "html"
  },
 "fieldSchema": {
    "fields": {
      "VendorName": {
        "type": "string",
        "method": "extract",
        "description": "Vendor issuing the receipt"
      },
      "Items": {
        "type": "array",
        "method": "extract",
        "items": {
          "type": "object",
          "properties": {
            "Description": {
              "type": "string",
              "method": "extract",
              "description": "Description of the item"
            },
            "Amount": {
              "type": "number",
              "method": "extract",
              "description": "Amount of the item"
            }
          }
        }
      }
    }
  }
}

Build analyzer

PUT request

curl -i -X PUT "{endpoint}/contentunderstanding/analyzers/{analyzerId}?api-version=2025-05-01-preview" \
  -H "Ocp-Apim-Subscription-Key: {key}" \
  -H "Content-Type: application/json" \
  -d @request_body.json

PUT response

The 201 Created response includes an Operation-Location header containing a URL that you can use to track the status of this asynchronous analyzer creation operation.

201 Created
Operation-Location: {endpoint}/contentunderstanding/analyzers/{analyzerId}/operations/{operationId}?api-version=2025-05-01-preview

Upon completion, performing an HTTP GET on the operation location URL returns "status": "succeeded".

curl -i -X GET "{endpoint}/contentunderstanding/analyzers/{analyzerId}/operations/{operationId}?api-version=2025-05-01-preview" \
  -H "Ocp-Apim-Subscription-Key: {key}"

Analyze file

Send file

You can now use the custom analyzer you created to process files and extract the fields you defined in the schema.

Before running the cURL command, make the following changes to the HTTP request:

  1. Replace {endpoint} and {key} with the endpoint and key values from your Azure portal Azure AI Foundry instance.
  2. Replace {analyzerId} with the name of the custom analyzer created earlier.
  3. Replace {fileUrl} with a publicly accessible URL of the file to analyze, such as a path to an Azure Storage Blob with a shared access signature (SAS) or the sample URL https://github.com/Azure-Samples/azure-ai-content-understanding-python/raw/refs/heads/main/data/receipt.png.

POST Request

curl -i -X POST "{endpoint}/contentunderstanding/analyzers/{analyzerId}:analyze?api-version=2025-05-01-preview" \
  -H "Ocp-Apim-Subscription-Key: {key}" \
  -H "Content-Type: application/json" \
  -d "{\"url\":\"{fileUrl}\"}"

POST Response

The 202 Accepted response includes the {resultId} which you can use to track the status of this asynchronous operation.

{
  "id": {resultId},
  "status": "Running",
  "result": {
    "analyzerId": {analyzerId},
    "apiVersion": "2025-05-01-preview",
    "createdAt": "YYYY-MM-DDTHH:MM:SSZ",
    "warnings": [],
    "contents": []
  }
}

Get Analyze Result

  1. Replace {endpoint} and {key} with the endpoint and key values from your Azure portal Azure AI Foundry instance.
  2. Replace {resultId} with the resultId in POST response.

GET Request

curl -i -X GET "{endpoint}/contentunderstanding/analyzerResults/{resultId}?api-version=2025-05-01-preview" \
  -H "Ocp-Apim-Subscription-Key: {key}"

GET Response

A 200 OK response includes a status field that shows the operation's progress.

  • status is Succeeded if the operation is completed successfully.
  • If it's running or notStarted, call the API again manually or with a script: wait at least one second between requests.
Sample Response
{
  "id": {resultId},
  "status": "Succeeded",
  "result": {
    "analyzerId": {analyzerId},
    "apiVersion": "2025-05-01-preview",
    "createdAt": "YYYY-MM-DDTHH:MM:SSZ",
    "warnings": [],
    "contents": [
      {
        "markdown": "Contoso\n\n123 Main Street\nRedmond, WA 98052\n\n987-654-3210\n\n6/10/2019 13:59\nSales Associate: Paul\n\n\n<table>\n<tr>\n<td>2 Surface Pro 6</td>\n<td>$1,998.00</td>\n</tr>\n<tr>\n<td>3 Surface Pen</td>\n<td>$299.97</td>\n</tr>\n</table> ...",
        "fields": {
          "VendorName": {
            "type": "string",
            "valueString": "Contoso",
            "spans": [{"offset": 0,"length": 7}],
            "confidence": 0.996,
            "source": "D(1,774.0000,72.0000,974.0000,70.0000,974.0000,111.0000,774.0000,113.0000)"
          },
          "Items": {
            "type": "array",
            "valueArray": [
              {
                "type": "object",
                "valueObject": {
                  "Description": {
                    "type": "string",
                    "valueString": "2 Surface Pro 6",
                    "spans": [ { "offset": 115, "length": 15}],
                    "confidence": 0.423,
                    "source": "D(1,704.0000,482.0000,875.0000,482.0000,875.0000,508.0000,704.0000,508.0000)"
                  },
                  "Amount": {
                    "type": "number",
                    "valueNumber": 1998,
                    "spans": [{ "offset": 140,"length": 9}
                    ],
                    "confidence": 0.957,
                    "source": "D(1,952.0000,482.0000,1048.0000,482.0000,1048.0000,508.0000,952.0000,509.0000)"
                  }
                }
              }, ...
            ]
          }
        },
        "kind": "document",
        "startPageNumber": 1,
        "endPageNumber": 1,
        "unit": "pixel",
        "pages": [
          {
            "pageNumber": 1,
            "angle": -0.0848,
            "width": 1743,
            "height": 878,
            "spans": [
              {
                "offset": 0,
                "length": 375
              }
            ],
            "words": [
              {
                "content": "Contoso",
                "span": {"offset": 0,"length": 7 },
                "confidence": 0.995,
                "source": "D(1,774,72,974,70,974,111,774,113)"
              }, ...

            ],
            "lines": [
              {
                "content": "Contoso",
                "source": "D(1,774,71,973,70,974,111,774,113)",
                "span": {"offset": 0,"length": 7}
              }, ...
            ]
          }
        ],
        "paragraphs": [
          {
            "content": "Contoso",
            "source": "D(1,774,71,973,70,974,111,774,113)",
            "span": {"offset": 0,"length": 7}
          }, ...
        ],
        "sectios": [
          {
            "span": {"offset": 0,"length": 374 },
            "elements": ["/paragraphs/0","/paragraphs/1", ...]
          }
        ],
        "tables": [
          {
            "rowCount": 2,
            "columnCount": 2,
            "cells": [
              {
                "kind": "content",
                "rowIndex": 0,
                "columnIndex": 0,
                "rowSpan": 1,
                "columnSpan": 1,
                "content": "2 Surface Pro 6",
                "source": "D(1,691,471,911,470,911,514,691,515)",
                "span": {"offset": 115,"length": 15},
                "elements": ["/paragraphs/4"]
              }, ...
            ],
            "source": "D(1,759,593,1056,592,1057,741,760,742)",
            "span": {"offset": 223,"length": 151}
          }
        ]
      }
    ]
  }
}