你当前正在访问 Microsoft Azure Global Edition 技术文档网站。如果需要访问由世纪互联运营的 Microsoft Azure 中国技术文档网站，请访问 https://docs.azure.cn。

通过 REST API 创建自定义分析器

内容理解分析器决定如何处理和提取信息。它们可确保所有内容的统一处理和输出结构，以提供可靠且可预测的结果。我们为常见用例提供预生成分析器。本指南演示如何自定义这些分析器，以更好地满足你的需求。

本指南使用 cURL 命令行工具。如果未安装，可以下载适用于开发环境的相应版本。

先决条件

要开始使用，请确保你拥有以下资源和权限：

一份 Azure 订阅。如果还没有 Azure 订阅，可以创建一个免费帐户。
拥有 Azure 订阅后，请在 Azure 门户中创建 Microsoft Foundry 资源。请务必在受支持的区域中创建它。
- 此资源在门户中的“Foundry”>“Foundry”下列出。
为内容理解资源设置默认模型部署。默认设置将创建与用于内容理解请求的 Foundry 模型的连接。选择下列方法之一：
- Portal
- REST API
1. 转到 “内容理解设置”页
2. 选择左上角的“+ 添加资源”按钮
3. 选择要使用的 Foundry 资源，然后单击“下一步”，然后单击“保存”
  - 选中“如果没有可用的默认值，请确保为所需模型启用自动部署”。确保资源已用所需的 GPT-4.1、GPT-4.1-mini 和 text-embedding-3-large 型模进行完整设置。不同的预生成分析器需要不同的模型。
通过执行这些步骤，可以在 Foundry 资源中设置内容理解模型和 Foundry 模型之间的连接。
1. 在 Foundry 模型中创建 GPT-4.1、GPT-4.1-mini 和 text-embedding-3-large 的模型部署。有关如何部署这些模型的详细信息，请参阅 Microsoft Foundry 门户中创建模型部署。不同的预生成分析器需要不同的模型，因此需要部署这三个模型。
2. 在资源级别定义默认模型部署。
  
  运行以下 cURL 命令之前，请对 HTTP 请求进行以下更改：
  - 将{endpoint}和{key}替换为 Azure 门户中您 Foundry 实例的相应值。
  - 请将 {myGPT41Deployment}、{myGPT41MiniDeployment}和 {myEmbeddingDeployment}替换为 Foundry 资源中的实际模型部署名称。
```
curl -i -X PATCH "{endpoint}/contentunderstanding/defaults?api-version=2025-11-01" \
  -H "Ocp-Apim-Subscription-Key: {key}" \
  -H "Content-Type: application/json" \
  -d '{
        "modelDeployments": {
          "gpt-4.1": "{myGPT41Deployment}",
          "gpt-4.1-mini": "{myGPT41MiniDeployment}",
          "text-embedding-3-large": "{myEmbeddingDeployment}"
        }
      }'
```
通过执行这些步骤，可以在 Foundry 资源中设置内容理解模型和 Foundry 模型之间的连接。

定义分析器架构

若要创建自定义分析器，请定义描述要提取的结构化数据的字段架构。在以下示例中，我们基于预生成的文档分析器创建分析器来处理收据。

创建包含以下内容的 JSON 文件 receipt.json ：

{
  "description": "Sample receipt analyzer",
  "baseAnalyzerId": "prebuilt-document",
  "models": {
      "completion": "gpt-4.1",
      "embedding": "text-embedding-ada-002"

    },
  "config": {
    "returnDetails": true,
    "enableFormula": false,
    "disableContentFiltering": false,
    "estimateFieldSourceAndConfidence": true,
    "tableFormat": "html"
  },
 "fieldSchema": {
    "fields": {
      "VendorName": {
        "type": "string",
        "method": "extract",
        "description": "Vendor issuing the receipt"
      },
      "Items": {
        "type": "array",
        "method": "extract",
        "items": {
          "type": "object",
          "properties": {
            "Description": {
              "type": "string",
              "method": "extract",
              "description": "Description of the item"
            },
            "Amount": {
              "type": "number",
              "method": "extract",
              "description": "Amount of the item"
            }
          }
        }
      }
    }
  }
}

如果你需要处理各种类型的文档，但想要仅对收据进行分类和分析，则可以创建先对文档进行分类的分析器。然后，使用以下架构将其路由到上面创建的分析器。

创建包含以下内容的 JSON 文件 categorize.json ：

{
  "baseAnalyzerId": "prebuilt-document",
  // Use the base analyzer to invoke the document specific capabilities.

  //Specify the model the analyzer should use. This is one of the supported completion models and one of the supported embeddings model. The specific deployment used during analyze is set on the resource or provided in the analyze request.
  "models": {
      "completion": "gpt-4.1",
      "embedding": "text-embedding-ada-002"

    },
  "config": {
    // Enable splitting of the input into segments. Set this property to false if you only expect a single document within the input file. When specified and enableSegment=false, the whole content will be classified into one of the categories.
    "enableSegment": false,

    "contentCategories": {
      // Category name.
      "receipt": {
        // Description to help with classification and splitting.
        "description": "Any images or documents of receipts",

        // Define the analyzer that any content classified as a receipt should be routed to
        "analyzerId": "receipt"
      },

      "invoice": {
        "description": "Any images or documents of invoice",
        "analyzerId": "prebuilt-invoice"
      },
      "policeReport": {
        "description": "A police or law enforcement report detailing the events that lead to the loss."
        // Don't perform analysis for this category.
      }

    },

    // Omit original content object and only return content objects from additional analysis.
    "omitContent": true
  }

  //You can use fieldSchema here to define fields that are needed from the entire input content.

}

若要创建自定义分析器，请定义描述要提取的结构化数据的字段架构。在以下示例中，我们基于预生成的图像分析器创建一个分析器，用于处理图表和图形的图像。

创建包含以下内容的 JSON 文件 request_body.json ：

{
  "description": "Sample image analyzer for charts and graphs",
  "baseAnalyzerId": "prebuilt-image",
  "models": {
      "completion": "gpt-4.1"
    },
  "config": {
    "disableContentFiltering": false
 },
 "fieldSchema": {
    "fields": {
      "Title": {
        "type": "string"
      },
      "ChartType": {
        "type": "string",
        "method": "classify",
        "enum": [ "bar", "line", "pie" ]
      }
    }
  }
}

若要创建自定义分析器，请定义描述要提取的结构化数据的字段架构。在以下示例中，我们基于预生成的呼叫中心分析器创建一个分析器，用于处理客户支持呼叫记录。

创建包含以下内容的 JSON 文件 request_body.json ：

{
  "description": "Sample customer support call analyzer",
  "baseAnalyzerId": "prebuilt-audio",
  "config": {
    "locales": ["en-US", "fr-FR"],
    "returnDetails": true,
    "disableContentFiltering": false
  },
  "fieldSchema": {
    "fields": {
      "Summary": {
        "type": "string",
        "method": "generate"
      },
      "Sentiment": {
        "type": "string",
        "method": "classify",
        "enum": ["Positive", "Neutral", "Negative"]
      },
      "People": {
        "type": "array",
        "description": "List of people mentioned",
        "items": {
          "type": "object",
          "properties": {
            "Name": { "type": "string" },
            "Role": { "type": "string" }
          }
        }
      }
    }
  }
}

若要创建自定义分析器，请定义描述要提取的结构化数据的字段架构。在以下示例中，我们基于预生成的视频分析器创建一个分析器，用于处理产品演示和评论。

创建包含以下内容的 JSON 文件 request_body.json ：

{
  "description": "Sample product demo video analyzer",
  "baseAnalyzerId": "prebuilt-video",
  "models": {
      "completion": "gpt-4.1"
    },
  "config": {
    "locales": ["en-US", "fr-FR"],
    "returnDetails": true,
    "enableFace": false,
    "disableFaceBlurring": false,
    "personDirectoryId": null,
    "segmentationMode": "auto",
    "disableContentFiltering": false
  },
   "fieldSchema": {
    "fields": {
      "Segments": {
        "type": "array",
        "items": {
          "type": "object",
          "properties": {
            "SegmentId": {
              "type": "string"
            },
            "Description": {
              "type": "string",
              "method": "generate",
              "description": "Detailed summary of the video segment, focusing on product characteristics, lighting, and color palette."
            },
            "Sentiment": {
              "type": "string",
              "method": "classify",
              "enum": ["Positive", "Neutral", "Negative"]
            }
          }
        }
      }
    }
  }
}

创建分析器

PUT 请求

首先创建收据分析器，然后创建分类分析器。

curl -i -X PUT "{endpoint}/contentunderstanding/analyzers/{analyzerId}?api-version=2025-11-01" \
  -H "Ocp-Apim-Subscription-Key: {key}" \
  -H "Content-Type: application/json" \
  -d @receipt.json

curl -i -X PUT "{endpoint}/contentunderstanding/analyzers/{analyzerId}?api-version=2025-11-01" \
  -H "Ocp-Apim-Subscription-Key: {key}" \
  -H "Content-Type: application/json" \
  -d @request_body.json

curl -i -X PUT "{endpoint}/contentunderstanding/analyzers/{analyzerId}?api-version=2025-11-01" \
  -H "Ocp-Apim-Subscription-Key: {key}" \
  -H "Content-Type: application/json" \
  -d @request_body.json

curl -i -X PUT "{endpoint}/contentunderstanding/analyzers/{analyzerId}?api-version=2025-11-01" \
  -H "Ocp-Apim-Subscription-Key: {key}" \
  -H "Content-Type: application/json" \
  -d @request_body.json

PUT 响应

201 Created 响应包含一个 Operation-Location 标头，其中包含一个可用于跟踪此异步分析器创建操作状态的 URL。

201 Created
Operation-Location: {endpoint}/contentunderstanding/analyzers/{analyzerId}/operations/{operationId}?api-version=2025-05-01-preview

完成后，对操作位置 URL 执行 HTTP GET 将返回"status": "succeeded"。

curl -i -X GET "{endpoint}/contentunderstanding/analyzers/{analyzerId}/operations/{operationId}?api-version=2025-11-01" \
  -H "Ocp-Apim-Subscription-Key: {key}"

分析文件

发送文件

现在可以使用创建的自定义分析器来处理文件并提取架构中定义的字段。

在运行 cURL 命令之前，请对 HTTP 请求进行以下更改：

将{endpoint}和{key}替换为 Azure 门户 Foundry 实例中的终结点和密钥值。
将 {analyzerId} 替换为使用 categorize.json 文件创建的自定义分析器的名称。
将 {fileUrl} 替换为要分析的文件的可公开访问 URL，例如具有共享访问签名 (SAS) 或示例 URL https://github.com/Azure-Samples/azure-ai-content-understanding-python/raw/refs/heads/main/data/receipt.png 的 Azure 存储 Blob 的路径。

将 {endpoint} 和 {key} 替换为 Azure 门户 Microsoft Foundry 服务实例中的终结点和密钥值。
将 {analyzerId} 替换为之前创建的自定义分析器的名称。
将 {fileUrl} 替换为要分析的文件的可公开访问 URL，例如具有共享访问签名 (SAS) 或示例 URL https://github.com/Azure-Samples/azure-ai-content-understanding-python/raw/refs/heads/main/data/pieChart.jpg 的 Azure 存储 Blob 的路径。

将 {endpoint} 和 {key} 替换为 Azure 门户 Microsoft Foundry 服务实例中的终结点和密钥值。
将 {analyzerId} 替换为之前创建的自定义分析器的名称。
将 {fileUrl} 替换为要分析的文件的可公开访问 URL，例如具有共享访问签名 (SAS) 或示例 URL https://github.com/Azure-Samples/azure-ai-content-understanding-python/raw/refs/heads/main/data/audio.wav 的 Azure 存储 Blob 的路径。

将 {endpoint} 和 {key} 替换为 Azure 门户 Microsoft Foundry 服务实例中的终结点和密钥值。
将 {analyzerId} 替换为之前创建的自定义分析器的名称。
将 {fileUrl} 替换为要分析的文件的可公开访问 URL，例如具有共享访问签名 (SAS) 或示例 URL https://github.com/Azure-Samples/azure-ai-content-understanding-python/raw/refs/heads/main/data/FlightSimulator.mp4 的 Azure 存储 Blob 的路径。

POST 请求

此示例使用您借助categorize.json 文件创建的自定义分析器来分析收据。

curl -i -X POST "{endpoint}/contentunderstanding/analyzers/{analyzerId}:analyze?api-version=2025-11-01" \
  -H "Ocp-Apim-Subscription-Key: {key}" \
  -H "Content-Type: application/json" \
  -d '{
        "inputs":[
          {
            "url": "https://github.com/Azure-Samples/azure-ai-content-understanding-python/raw/refs/heads/main/data/receipt.png"
          }          
        ]
      }'

此示例使用创建的自定义分析器来分析图表或图形图像。

curl -i -X POST "{endpoint}/contentunderstanding/analyzers/{analyzerId}:analyze?api-version=2025-11-01" \
  -H "Ocp-Apim-Subscription-Key: {key}" \
  -H "Content-Type: application/json" \
  -d '{
        "inputs":[
          {
            "url": "https://github.com/Azure-Samples/azure-ai-content-understanding-python/raw/refs/heads/main/data/pieChart.jpg"
          }          
        ]
      }'

此示例使用你创建的自定义分析器来分析客户支持呼叫记录。

curl -i -X POST "{endpoint}/contentunderstanding/analyzers/{analyzerId}:analyze?api-version=2025-11-01" \
  -H "Ocp-Apim-Subscription-Key: {key}" \
  -H "Content-Type: application/json" \
  -d '{
        "inputs":[
          {
            "url": "https://github.com/Azure-Samples/azure-ai-content-understanding-python/raw/refs/heads/main/data/audio.wav"
          }          
        ]
      }'

此示例使用创建的自定义分析器来分析产品演示视频。

curl -i -X POST "{endpoint}/contentunderstanding/analyzers/{analyzerId}:analyze?api-version=2025-11-01" \
  -H "Ocp-Apim-Subscription-Key: {key}" \
  -H "Content-Type: application/json" \
  -d '{
        "inputs":[
          {
            "url": "https://github.com/Azure-Samples/azure-ai-content-understanding-python/raw/refs/heads/main/data/FlightSimulator.mp4"
          }          
        ]
      }'

POST 响应

响应 202 Accepted 包括 {resultId}，可用于跟踪此异步作的状态。

{
  "id": {resultId},
  "status": "Running",
  "result": {
    "analyzerId": {analyzerId},
    "apiVersion": "2025-11-01",
    "createdAt": "YYYY-MM-DDTHH:MM:SSZ",
    "warnings": [],
    "contents": []
  }
}

获取分析结果

使用 Operation-Location 响应中的 POST，并检索分析结果。

GET 请求

curl -i -X GET "{endpoint}/contentunderstanding/analyzerResults/{resultId}?api-version=2025-11-01" \
  -H "Ocp-Apim-Subscription-Key: {key}"

GET 响应

200 OK响应包含一个显示操作进度的status字段。

status 如果操作成功完成，则为 Succeeded。
如果是running或notStarted，请手动或使用脚本再次调用 API：请求之间至少间隔一秒钟。

示例响应

{
  "id": {resultId},
  "status": "Succeeded",
  "result": {
    "analyzerId": {analyzerId},
    "apiVersion": "2025-11-01",
    "createdAt": "YYYY-MM-DDTHH:MM:SSZ",
    "warnings": [],
    "contents": [
      {
        "path": "input1/segment1",
        "category": "receipt",
        "markdown": "Contoso\n\n123 Main Street\nRedmond, WA 98052\n\n987-654-3210\n\n6/10/2019 13:59\nSales Associate: Paul\n\n\n<table>\n<tr>\n<td>2 Surface Pro 6</td>\n<td>$1,998.00</td>\n</tr>\n<tr>\n<td>3 Surface Pen</td>\n<td>$299.97</td>\n</tr>\n</table> ...",
        "fields": {
          "VendorName": {
            "type": "string",
            "valueString": "Contoso",
            "spans": [{"offset": 0,"length": 7}],
            "confidence": 0.996,
            "source": "D(1,774.0000,72.0000,974.0000,70.0000,974.0000,111.0000,774.0000,113.0000)"
          },
          "Items": {
            "type": "array",
            "valueArray": [
              {
                "type": "object",
                "valueObject": {
                  "Description": {
                    "type": "string",
                    "valueString": "2 Surface Pro 6",
                    "spans": [ { "offset": 115, "length": 15}],
                    "confidence": 0.423,
                    "source": "D(1,704.0000,482.0000,875.0000,482.0000,875.0000,508.0000,704.0000,508.0000)"
                  },
                  "Amount": {
                    "type": "number",
                    "valueNumber": 1998,
                    "spans": [{ "offset": 140,"length": 9}
                    ],
                    "confidence": 0.957,
                    "source": "D(1,952.0000,482.0000,1048.0000,482.0000,1048.0000,508.0000,952.0000,509.0000)"
                  }
                }
              }, ...
            ]
          }
        },
        "kind": "document",
        "startPageNumber": 1,
        "endPageNumber": 1,
        "unit": "pixel",
        "pages": [
          {
            "pageNumber": 1,
            "angle": -0.0944,
            "width": 1743,
            "height": 878
          }
        ],
        "analyzerId": "{analyzerId}",
        "mimeType": "image/png"
      }
    ]
  },
  "usage": {
    "documentPages": 1,
    "tokens": {
      "contextualization": 1000
    }
  }
}

{
  "id": {resultId},
  "status": "Succeeded",
  "result": {
    "analyzerId": {analyzerId},
    "apiVersion": "2025-11-01",
    "createdAt": "YYYY-MM-DDTHH:MM:SSZ",
    "warnings": [],
    "contents": [
      {
        "markdown": "![image](image)\n",
        "fields": {
          "Title": {
            "type": "string",
            "valueString": "Weekly Work Hours Distribution"
          },
          "ChartType": {
            "type": "string",
            "valueString": "pie"
          }
        },
       "kind": "document",
        "startPageNumber": 1,
        "endPageNumber": 1,
        "unit": "pixel",
        "pages": [
          {
            "pageNumber": 1
          }
        ],
        "analyzerId": "{analyzerId}",
        "mimeType": "image/jpeg"
      }
    ]
  },
  "usage": {
    "tokens": {
      "contextualization": 1000
    }
  }
}

{
  "id": {resultId},
  "status": "Succeeded",
  "result": {
    "analyzerId": {analyzerId},
    "apiVersion": "2025-11-01",
    "createdAt": "YYYY-MM-DDTHH:MM:SSZ",
    "warnings": [],
    "contents": [
      {
        "markdown": "# Audio: 00:00.000 => 01:54.670\nTranscript\n```\n<v Agent>Thank you for calling Woodgrove Travel...\n<v Customer>Hi Isabella, my name is John Smith...\n<v Agent>Could you provide flight details?\n<v Customer>Contoso Airways, flight CA123...\n<v Agent>Sorry to 
                     hear that...\n<v Customer>Flight delay made me miss meeting...\n<v Agent>We’ll offer a partial refund...\n<v Customer>Thanks, appreciate your help!\n```",
        "fields": {
          "Summary": {
            "type": "string",
            "valueString": "John Smith contacted Woodgrove Travel to report a negative experience with a flight on Contoso Airways ..."
          },
          "Sentiment": {
            "type": "string",
            "valueString": "Positive"
          },
          "People": {
            "type": "array",
            "valueArray": [
              {
                "type": "object",
                "valueObject": {
                  "Name": {
                    "type": "string",
                    "valueString": "Isabella Taylor"
                  },
                  "Role": {
                    "type": "string",
                    "valueString": "Agent"
                  }
                }
              }, ...
            ]
          }
        },
        "kind": "audioVisual",
        "startTimeMs": 0,
        "endTimeMs": 114670,
        "transcriptPhrases": [
          {
            "speaker": "Agent",
            "startTimeMs": 80,
            "endTimeMs": 2160,
            "text": "Thank you for calling Woodgrove Travel.",
            "words": []
          }, ...

        ]
      }
    ]
  },
  "usage": {
    "audioHours": 0.032,
    "tokens": {
      "contextualization": 3194.445
    }
  }
}

{
  "id": {resultId},
  "status": "Succeeded",
  "result": {
    "analyzerId": {analyzerId},
    "apiVersion": "2025-11-01",
    "createdAt": "YYYY-MM-DDTHH:MM:SS",
    "warnings": [],
    "contents": [
      {
        "markdown": "# Video: 00:00 => 00:43\n## Segment 1: Island view\nTranscript\n```\n00:01 --> 00:06\n<Speaker 1>Good data improves TTS.\n```\nKey Frames: ![](keyFrame.726.jpg) ## Segment 2: Data center\nTranscript\n```\n00:07 --> 00:13\n<Speaker 2>We trained on 3,000   
                     hours.\n```\nKey Frames: ![](keyFrame.2046.jpg) ![](keyFrame.4884.jpg)",
        "fields": {
          "Segments": {
            "type": "array",
            "valueArray": [
              {
                "type": "object",
                "valueObject": {
                  
                  "SegmentId": {
                    "type": "string",
                    "valueString": "00:00:00.000-00:00:01.467"
                  },
                  "Description": {
                    "type": "string",
                    "valueString": "The video opens with a dramatic aerial shot of a small airplane flying over a tropical island surrounded by turquoise waters. The logos for 'Flight Simulator' and 'Microsoft Azure AI' are prominently displayed, indicating a collaboration or feature integration between the two."
                  },
                  "Sentiment": {
                    "type": "string",
                    "valueString": "Positive"
                  }
                }
              }, ...
            ]
          }
        },
        "kind": "audioVisual",
        "startTimeMs": 0,
        "endTimeMs": 43866,
        "width": 1080,
        "height": 608,
        "KeyFrameTimesMs": [733, ... , 43233],
        "transcriptPhrases": [
          {
            "speaker": "Speaker 1",
            "startTimeMs": 1360,
            "endTimeMs": 6640,
            "text": "When it comes to the neural TTS, in order to get a good voice, it's better to have good data.",
            "words": []
          }, ...
        ],
        "cameraShotTimesMs": [1467, ...  42033],
        "segments": [
          {
            "startTimeMs": 0,
            "endTimeMs": 1467,
            "description": "The video begins with a scenic aerial view of an island, showcasing the collaboration between Flight Simulator and Microsoft Azure AI.",
            "segmentId": "1"
          }, ...
        ]
      }
    ]
  },
  "usage": {
    "videoHours": 0.013,
    "tokens": {
      "contextualization": 12222.223
    }
  }
}

后续步骤

查看代码示例：可视化文档搜索。
查看代码示例：分析器模板。
尝试使用 Foundry 的内容理解功能来处理文档内容。

反馈

此页面是否有帮助？

Last updated on 2025-11-20

通过

通过 REST API 创建自定义分析器

先决条件

定义分析器架构

创建分析器

PUT 请求

PUT 响应

分析文件

发送文件

POST 请求

POST 响应

获取分析结果

GET 请求

GET 响应

示例响应

后续步骤

反馈

其他资源