Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Content Understanding analyzers define how to process and extract insights from your content. They ensure uniform processing and output structure across all your content to deliver reliable and predictable results. We offer prebuilt analyzers for common use cases. This guide shows how these analyzers can be customized to better fit your needs.
In this guide, we use the cURL command line tool. If it isn't installed, you can download the appropriate version for your dev environment.
Define an analyzer schema
To create a custom analyzer, define a field schema that describes the structured data you want to extract. In the following example, we create an analyzer based on prebuilt document analyzer for processing a receipt.
Create a JSON file named request_body.json
with the following content:
{
"description": "Sample receipt analyzer",
"baseAnalyzerId": "prebuilt-documentAnalyzer",
"config": {
"returnDetails": true,
"enableFormula": false,
"disableContentFiltering": false,
"estimateFieldSourceAndConfidence": true,
"tableFormat": "html"
},
"fieldSchema": {
"fields": {
"VendorName": {
"type": "string",
"method": "extract",
"description": "Vendor issuing the receipt"
},
"Items": {
"type": "array",
"method": "extract",
"items": {
"type": "object",
"properties": {
"Description": {
"type": "string",
"method": "extract",
"description": "Description of the item"
},
"Amount": {
"type": "number",
"method": "extract",
"description": "Amount of the item"
}
}
}
}
}
}
}
Build analyzer
PUT request
curl -i -X PUT "{endpoint}/contentunderstanding/analyzers/{analyzerId}?api-version=2025-05-01-preview" \
-H "Ocp-Apim-Subscription-Key: {key}" \
-H "Content-Type: application/json" \
-d @request_body.json
PUT response
The 201 Created
response includes an Operation-Location
header containing a URL that you can use to track the status of this asynchronous analyzer creation operation.
201 Created
Operation-Location: {endpoint}/contentunderstanding/analyzers/{analyzerId}/operations/{operationId}?api-version=2025-05-01-preview
Upon completion, performing an HTTP GET on the operation location URL returns "status": "succeeded"
.
curl -i -X GET "{endpoint}/contentunderstanding/analyzers/{analyzerId}/operations/{operationId}?api-version=2025-05-01-preview" \
-H "Ocp-Apim-Subscription-Key: {key}"
Analyze file
Send file
You can now use the custom analyzer you created to process files and extract the fields you defined in the schema.
Before running the cURL command, make the following changes to the HTTP request:
- Replace
{endpoint}
and{key}
with the endpoint and key values from your Azure portal Azure AI Foundry instance. - Replace
{analyzerId}
with the name of the custom analyzer created earlier. - Replace
{fileUrl}
with a publicly accessible URL of the file to analyze, such as a path to an Azure Storage Blob with a shared access signature (SAS) or the sample URLhttps://github.com/Azure-Samples/azure-ai-content-understanding-python/raw/refs/heads/main/data/receipt.png
.
POST Request
curl -i -X POST "{endpoint}/contentunderstanding/analyzers/{analyzerId}:analyze?api-version=2025-05-01-preview" \
-H "Ocp-Apim-Subscription-Key: {key}" \
-H "Content-Type: application/json" \
-d "{\"url\":\"{fileUrl}\"}"
POST Response
The 202 Accepted
response includes the {resultId}
which you can use to track the status of this asynchronous operation.
{
"id": {resultId},
"status": "Running",
"result": {
"analyzerId": {analyzerId},
"apiVersion": "2025-05-01-preview",
"createdAt": "YYYY-MM-DDTHH:MM:SSZ",
"warnings": [],
"contents": []
}
}
Get Analyze Result
- Replace
{endpoint}
and{key}
with the endpoint and key values from your Azure portal Azure AI Foundry instance. - Replace
{resultId}
with theresultId
inPOST
response.
GET Request
curl -i -X GET "{endpoint}/contentunderstanding/analyzerResults/{resultId}?api-version=2025-05-01-preview" \
-H "Ocp-Apim-Subscription-Key: {key}"
GET Response
A 200 OK
response includes a status
field that shows the operation's progress.
status
isSucceeded
if the operation is completed successfully.- If it's
running
ornotStarted
, call the API again manually or with a script: wait at least one second between requests.
Sample Response
{
"id": {resultId},
"status": "Succeeded",
"result": {
"analyzerId": {analyzerId},
"apiVersion": "2025-05-01-preview",
"createdAt": "YYYY-MM-DDTHH:MM:SSZ",
"warnings": [],
"contents": [
{
"markdown": "Contoso\n\n123 Main Street\nRedmond, WA 98052\n\n987-654-3210\n\n6/10/2019 13:59\nSales Associate: Paul\n\n\n<table>\n<tr>\n<td>2 Surface Pro 6</td>\n<td>$1,998.00</td>\n</tr>\n<tr>\n<td>3 Surface Pen</td>\n<td>$299.97</td>\n</tr>\n</table> ...",
"fields": {
"VendorName": {
"type": "string",
"valueString": "Contoso",
"spans": [{"offset": 0,"length": 7}],
"confidence": 0.996,
"source": "D(1,774.0000,72.0000,974.0000,70.0000,974.0000,111.0000,774.0000,113.0000)"
},
"Items": {
"type": "array",
"valueArray": [
{
"type": "object",
"valueObject": {
"Description": {
"type": "string",
"valueString": "2 Surface Pro 6",
"spans": [ { "offset": 115, "length": 15}],
"confidence": 0.423,
"source": "D(1,704.0000,482.0000,875.0000,482.0000,875.0000,508.0000,704.0000,508.0000)"
},
"Amount": {
"type": "number",
"valueNumber": 1998,
"spans": [{ "offset": 140,"length": 9}
],
"confidence": 0.957,
"source": "D(1,952.0000,482.0000,1048.0000,482.0000,1048.0000,508.0000,952.0000,509.0000)"
}
}
}, ...
]
}
},
"kind": "document",
"startPageNumber": 1,
"endPageNumber": 1,
"unit": "pixel",
"pages": [
{
"pageNumber": 1,
"angle": -0.0848,
"width": 1743,
"height": 878,
"spans": [
{
"offset": 0,
"length": 375
}
],
"words": [
{
"content": "Contoso",
"span": {"offset": 0,"length": 7 },
"confidence": 0.995,
"source": "D(1,774,72,974,70,974,111,774,113)"
}, ...
],
"lines": [
{
"content": "Contoso",
"source": "D(1,774,71,973,70,974,111,774,113)",
"span": {"offset": 0,"length": 7}
}, ...
]
}
],
"paragraphs": [
{
"content": "Contoso",
"source": "D(1,774,71,973,70,974,111,774,113)",
"span": {"offset": 0,"length": 7}
}, ...
],
"sectios": [
{
"span": {"offset": 0,"length": 374 },
"elements": ["/paragraphs/0","/paragraphs/1", ...]
}
],
"tables": [
{
"rowCount": 2,
"columnCount": 2,
"cells": [
{
"kind": "content",
"rowIndex": 0,
"columnIndex": 0,
"rowSpan": 1,
"columnSpan": 1,
"content": "2 Surface Pro 6",
"source": "D(1,691,471,911,470,911,514,691,515)",
"span": {"offset": 115,"length": 15},
"elements": ["/paragraphs/4"]
}, ...
],
"source": "D(1,759,593,1056,592,1057,741,760,742)",
"span": {"offset": 223,"length": 151}
}
]
}
]
}
}