適用于 Python 的 Azure AI Document Intelligence 用戶端程式庫 - 1.0.0b1 版

發行項
11/22/2023

Azure AI Document Intelligence (先前稱為表格辨識器) 是一項雲端服務，會使用機器學習來分析檔中的文字和結構化資料。其中包含下列主要功能：

版面配置 - 擷取內容和結構 (例如文字、選取標記、表格) 檔。
檔 - 除了檔的一般版面配置之外，還分析索引鍵/值組。
讀取 - 從檔讀取頁面資訊。
預先建置 - 從選取檔案類型 (擷取一般域值，例如收據、發票、名片、識別碼檔、美國 W-2 稅務檔，以及其他使用預先建置模型) 。
自訂 - 從您自己的資料建置自訂模型，以擷取量身打造的域值，以及檔的一般版面配置。
分類器 - 建置自訂分類模型，結合版面配置和語言功能，以精確地偵測及識別您在應用程式內處理的檔。
附加元件功能 - 擷取條碼/QR 代碼、公式、字型/樣式等，或為具有選擇性參數的大型檔啟用高解析度模式。

開始使用

安裝套件

python -m pip install azure-ai-documentintelligence

必要條件

需要 Python 3.7 或更新版本才能使用此套件。
您需要 Azure 訂用帳戶才能使用此套件。
現有的 Azure AI Document Intelligence 實例。

建立認知服務或檔智慧資源

檔智慧支援多重服務和單一服務存取。如果您打算在單一端點/金鑰下存取多個認知服務，請建立認知服務資源。若為僅限文件智慧服務存取，請建立文件智慧服務資源。請注意，如果您想要使用 Azure Active Directory 驗證，您將需要單一服務資源。

您可以使用下列其中一個資源來建立其中一個資源：

選項 1： Azure 入口網站。
選項 2： Azure CLI。

以下是如何使用 CLI 建立檔智慧資源的範例：

# Create a new resource group to hold the Document Intelligence resource
# if using an existing resource group, skip this step
az group create --name <your-resource-name> --location <location>

# Create the Document Intelligence resource
az cognitiveservices account create \
    --name <your-resource-name> \
    --resource-group <your-resource-group-name> \
    --kind FormRecognizer \
    --sku <sku> \
    --location <location> \
    --yes

如需建立資源或如何取得位置和 SKU 資訊的詳細資訊，請參閱這裡。

驗證用戶端

若要與 Document Intelligence 服務互動，您必須建立用戶端的實例。必須有端點和認證，才能具現化用戶端物件。

取得端點

您可以使用 Azure 入口網站或 Azure CLI來尋找 Document Intelligence 資源的端點：

# Get the endpoint for the Document Intelligence resource
az cognitiveservices account show --name "resource-name" --resource-group "resource-group-name" --query "properties.endpoint"

區域端點或自訂子域都可用於驗證。它們的格式如下：

Regional endpoint: https://<region>.api.cognitive.microsoft.com/
Custom subdomain: https://<resource-name>.cognitiveservices.azure.com/

區域中每個資源的區域端點都相同。如需支援的區域端點完整清單，請參閱這裡。請注意，區域端點不支援 AAD 驗證。

另一方面，自訂子域是 Document Intelligence 資源唯一的名稱。它們只能由單一服務資源使用。

取得 API 金鑰

您可以在 Azure 入口網站中找到 API 金鑰，或執行下列 Azure CLI 命令：

az cognitiveservices account keys list --name "<resource-name>" --resource-group "<resource-group-name>"

使用 AzureKeyCredential 建立用戶端

若要使用 API 金鑰作為 credential 參數，請將金鑰當作字串傳遞至 AzureKeyCredential的實例。

from azure.core.credentials import AzureKeyCredential
from azure.ai.documentintelligence import DocumentIntelligenceClient

endpoint = "https://<my-custom-subdomain>.cognitiveservices.azure.com/"
credential = AzureKeyCredential("<api_key>")
document_analysis_client = DocumentIntelligenceClient(endpoint, credential)

使用 Azure Active Directory 認證建立用戶端

AzureKeyCredential 驗證用於本入門指南中的範例，但您也可以使用 azure-identity 程式庫向 Azure Active Directory 進行驗證。請注意，區域端點不支援 AAD 驗證。為您的資源建立自訂子域名稱，以使用此類型的驗證。

若要使用如下所示的 DefaultAzureCredential 類型，或 Azure SDK 提供的其他認證類型，請安裝 azure-identity 套件：

pip install azure-identity

您也需要註冊新的 AAD 應用程式，並將角色指派 "Cognitive Services User" 給服務主體，以授與檔智慧的存取權。

完成後，請將 AAD 應用程式的用戶端識別碼、租使用者識別碼和用戶端密碼的值設定為環境變數： AZURE_CLIENT_ID 、、 AZURE_TENANT_IDAZURE_CLIENT_SECRET 。

"""DefaultAzureCredential will use the values from these environment
variables: AZURE_CLIENT_ID, AZURE_TENANT_ID, AZURE_CLIENT_SECRET
"""
from azure.ai.documentintelligence import DocumentIntelligenceClient
from azure.identity import DefaultAzureCredential

endpoint = os.environ["DOCUMENTINTELLIGENCE_ENDPOINT"]
credential = DefaultAzureCredential()

document_analysis_client = DocumentIntelligenceClient(endpoint, credential)

重要概念

DocumentIntelligenceClient

DocumentIntelligenceClient 提供透過 API 使用預先建置和自訂模型 begin_analyze_document 分析輸入檔的作業。 model_id使用參數來選取要分析的模型類型。請參閱這裡的完整支援模型清單。 DocumentIntelligenceClient也提供透過 API 分類檔的 begin_classify_document 作業。自訂分類模型可以將輸入檔中的每個頁面分類，以識別內的檔 () ，也可以識別輸入檔內單一檔的多個檔或多個實例。

這裡提供範例程式碼片段來說明如何使用 DocumentIntelligenceClient 。如需分析檔的詳細資訊，包括支援的功能、地區設定和檔案類型，請參閱服務檔。

DocumentIntelligenceAdministrationClient

DocumentIntelligenceAdministrationClient 會提供用於下列目的的作業：

建置自訂模型以分析您藉由標記自訂檔所指定的特定欄位。會傳回， DocumentModelDetails 指出模型可以分析 () 檔案類型，以及每個欄位的估計信賴度。如需更詳細的說明，請參閱服務檔。
從現有模型的集合建立組成模型。
管理在您的帳戶中建立的模型。
列出作業，或取得過去 24 小時內建立的特定模型作業。
將自訂模型從一個文件智慧服務資源複製到另一個。
建置和管理自訂分類模型，以分類您在應用程式內處理的檔。

請注意，模型也可以使用 Document Intelligence Studio等圖形化使用者介面來建置。

這裡提供範例程式碼片段來說明如何使用 DocumentIntelligenceAdministrationClient 。

長期執行作業

長時間執行的作業是由傳送至服務以啟動作業的初始要求所組成，接著依間隔輪詢服務，以判斷作業是否已完成或失敗，以及是否成功，以取得結果。

分析檔、建置模型或複製/撰寫模型的方法會模型化為長時間執行的作業。用戶端會公開傳 begin_<method-name> 回 LROPoller 或 AsyncLROPoller 的方法。呼叫端應該等候作業完成，方法是 result() 呼叫從方法傳回的 begin_<method-name> 輪詢器物件。提供範例程式碼片段來說明如何使用長時間執行的作業。

範例

下一節提供數個程式碼片段，涵蓋一些最常見的檔智慧工作，包括：

擷取
使用一般檔模型
使用預先建置的模型
建置自訂模型
使用自訂模型分析檔
管理您的模型
附加元件功能

擷取

從檔擷取文字、選取標記、文字樣式和表格結構及其周框區域座標。

from azure.core.credentials import AzureKeyCredential
from azure.ai.documentintelligence import DocumentIntelligenceClient

endpoint = os.environ["DOCUMENTINTELLIGENCE_ENDPOINT"]
key = os.environ["DOCUMENTINTELLIGENCE_API_KEY"]

document_intelligence_client = DocumentIntelligenceClient(
    endpoint=endpoint, credential=AzureKeyCredential(key)
)
with open(path_to_sample_documents, "rb") as f:
    poller = document_intelligence_client.begin_analyze_document(
        "prebuilt-layout", analyze_request=f, content_type="application/octet-stream"
    )
result = poller.result()

for idx, style in enumerate(result.styles):
    print(
        "Document contains {} content".format(
            "handwritten" if style.is_handwritten else "no handwritten"
        )
    )

for page in result.pages:
    print("----Analyzing layout from page #{}----".format(page.page_number))
    print(
        "Page has width: {} and height: {}, measured with unit: {}".format(
            page.width, page.height, page.unit
        )
    )

    for line_idx, line in enumerate(page.lines):
        words = line.get_words()
        print(
            "...Line # {} has word count {} and text '{}' within bounding polygon '{}'".format(
                line_idx,
                len(words),
                line.content,
                line.polygon,
            )
        )

        for word in words:
            print(
                "......Word '{}' has a confidence of {}".format(
                    word.content, word.confidence
                )
            )

    for selection_mark in page.selection_marks:
        print(
            "...Selection mark is '{}' within bounding polygon '{}' and has a confidence of {}".format(
                selection_mark.state,
                selection_mark.polygon,
                selection_mark.confidence,
            )
        )

for table_idx, table in enumerate(result.tables):
    print(
        "Table # {} has {} rows and {} columns".format(
            table_idx, table.row_count, table.column_count
        )
    )
    for region in table.bounding_regions:
        print(
            "Table # {} location on page: {} is {}".format(
                table_idx,
                region.page_number,
                region.polygon,
            )
        )
    for cell in table.cells:
        print(
            "...Cell[{}][{}] has content '{}'".format(
                cell.row_index,
                cell.column_index,
                cell.content,
            )
        )
        for region in cell.bounding_regions:
            print(
                "...content on page {} is within bounding polygon '{}'".format(
                    region.page_number,
                    region.polygon,
                )
            )

print("----------------------------------------")

使用預先建置的模型

使用檔智慧服務提供的預先建置模型，從選取檔案類型擷取欄位，例如收據、發票、名片、身分識別檔和美國 W-2 稅務檔。

例如，若要分析銷售收據中的欄位，請使用傳遞至 begin_analyze_document 方法所提供的 model_id="prebuilt-receipt" 預先建置收據模型：

from azure.core.credentials import AzureKeyCredential
from azure.ai.documentintelligence import DocumentIntelligenceClient

endpoint = os.environ["DOCUMENTINTELLIGENCE_ENDPOINT"]
key = os.environ["DOCUMENTINTELLIGENCE_API_KEY"]

document_analysis_client = DocumentIntelligenceClient(endpoint=endpoint, credential=AzureKeyCredential(key))
with open(path_to_sample_documents, "rb") as f:
    poller = document_analysis_client.begin_analyze_document(
        "prebuilt-receipt", analyze_request=f, locale="en-US", content_type="application/octet-stream"
    )
receipts = poller.result()

for idx, receipt in enumerate(receipts.documents):
    print(f"--------Analysis of receipt #{idx + 1}--------")
    print(f"Receipt type: {receipt.doc_type if receipt.doc_type else 'N/A'}")
    merchant_name = receipt.fields.get("MerchantName")
    if merchant_name:
        print(f"Merchant Name: {merchant_name.get('valueString')} has confidence: " f"{merchant_name.confidence}")
    transaction_date = receipt.fields.get("TransactionDate")
    if transaction_date:
        print(
            f"Transaction Date: {transaction_date.get('valueDate')} has confidence: "
            f"{transaction_date.confidence}"
        )
    if receipt.fields.get("Items"):
        print("Receipt items:")
        for idx, item in enumerate(receipt.fields.get("Items").get("valueArray")):
            print(f"...Item #{idx + 1}")
            item_description = item.get("valueObject").get("Description")
            if item_description:
                print(
                    f"......Item Description: {item_description.get('valueString')} has confidence: "
                    f"{item_description.confidence}"
                )
            item_quantity = item.get("valueObject").get("Quantity")
            if item_quantity:
                print(
                    f"......Item Quantity: {item_quantity.get('valueString')} has confidence: "
                    f"{item_quantity.confidence}"
                )
            item_total_price = item.get("valueObject").get("TotalPrice")
            if item_total_price:
                print(
                    f"......Total Item Price: {format_price(item_total_price.get('valueCurrency'))} has confidence: "
                    f"{item_total_price.confidence}"
                )
    subtotal = receipt.fields.get("Subtotal")
    if subtotal:
        print(f"Subtotal: {format_price(subtotal.get('valueCurrency'))} has confidence: {subtotal.confidence}")
    tax = receipt.fields.get("TotalTax")
    if tax:
        print(f"Total tax: {format_price(tax.get('valueCurrency'))} has confidence: {tax.confidence}")
    tip = receipt.fields.get("Tip")
    if tip:
        print(f"Tip: {format_price(tip.get('valueCurrency'))} has confidence: {tip.confidence}")
    total = receipt.fields.get("Total")
    if total:
        print(f"Total: {format_price(total.get('valueCurrency'))} has confidence: {total.confidence}")
    print("--------------------------------------")

您不限於收據！有一些預先建置的模型可供選擇，每個模型都有自己的支援欄位集。請參閱這裡其他支援的預建模型。

建置自訂模型

在您自己的檔案類型上建置自訂模型。產生的模型可用來分析所定型檔案類型的值。提供您用來儲存訓練檔的 Azure 儲存體 Blob 容器的容器 SAS URL。

如需有關設定容器和必要檔案結構的詳細資訊，請參閱服務檔。

from azure.ai.formrecognizer import (
    DocumentIntelligenceAdministrationClient,
    ModelBuildMode,
)
from azure.core.credentials import AzureKeyCredential

endpoint = os.environ["DOCUMENTINTELLIGENCE_ENDPOINT"]
key = os.environ["DOCUMENTINTELLIGENCE_API_KEY"]
container_sas_url = os.environ["CONTAINER_SAS_URL"]

document_model_admin_client = DocumentIntelligenceAdministrationClient(
    endpoint, AzureKeyCredential(key)
)
poller = document_model_admin_client.begin_build_document_model(
    ModelBuildMode.TEMPLATE,
    blob_container_url=container_sas_url,
    description="my model description",
)
model = poller.result()

print(f"Model ID: {model.model_id}")
print(f"Description: {model.description}")
print(f"Model created on: {model.created_on}")
print(f"Model expires on: {model.expires_on}")
print("Doc types the model can recognize:")
for name, doc_type in model.doc_types.items():
    print(
        f"Doc Type: '{name}' built with '{doc_type.build_mode}' mode which has the following fields:"
    )
    for field_name, field in doc_type.field_schema.items():
        print(
            f"Field: '{field_name}' has type '{field['type']}' and confidence score "
            f"{doc_type.field_confidence[field_name]}"
        )

使用自訂模型分析檔

分析檔欄位、資料表、選取標記等等。這些模型會使用您自己的資料定型，因此會針對您的檔量身訂做。為了獲得最佳結果，您應該只分析自訂模型所建置之相同檔案類型的檔。

from azure.core.credentials import AzureKeyCredential
from azure.ai.documentintelligence import DocumentIntelligenceClient

endpoint = os.environ["DOCUMENTINTELLIGENCE_ENDPOINT"]
key = os.environ["DOCUMENTINTELLIGENCE_API_KEY"]
model_id = os.getenv("CUSTOM_BUILT_MODEL_ID", custom_model_id)

document_analysis_client = DocumentIntelligenceClient(endpoint=endpoint, credential=AzureKeyCredential(key))

# Make sure your document's type is included in the list of document types the custom model can analyze
with open(path_to_sample_documents, "rb") as f:
    poller = document_analysis_client.begin_analyze_document(
        model_id=model_id, analyze_request=f, content_type="application/octet-stream"
    )
result = poller.result()

for idx, document in enumerate(result.documents):
    print(f"--------Analyzing document #{idx + 1}--------")
    print(f"Document has type {document.doc_type}")
    print(f"Document has document type confidence {document.confidence}")
    print(f"Document was analyzed with model with ID {result.model_id}")
    for name, field in document.fields.items():
        field_value = field.get("valueString") if field.get("valueString") else field.content
        print(
            f"......found field of type '{field.type}' with value '{field_value}' and with confidence {field.confidence}"
        )

# iterate over tables, lines, and selection marks on each page
for page in result.pages:
    print(f"\nLines found on page {page.page_number}")
    for line in page.lines:
        print(f"...Line '{line.content}'")
    for word in page.words:
        print(f"...Word '{word.content}' has a confidence of {word.confidence}")
    if page.selection_marks:
        print(f"\nSelection marks found on page {page.page_number}")
        for selection_mark in page.selection_marks:
            print(
                f"...Selection mark is '{selection_mark.state}' and has a confidence of {selection_mark.confidence}"
            )

for i, table in enumerate(result.tables):
    print(f"\nTable {i + 1} can be found on page:")
    for region in table.bounding_regions:
        print(f"...{region.page_number}")
    for cell in table.cells:
        print(f"...Cell[{cell.row_index}][{cell.column_index}] has text '{cell.content}'")
print("-----------------------------------")

此外，檔 URL 也可以用來使用 begin_analyze_document 方法來分析檔。

from azure.core.credentials import AzureKeyCredential
from azure.ai.documentintelligence import DocumentIntelligenceClient
from azure.ai.documentintelligence.models import AnalyzeDocumentRequest

endpoint = os.environ["DOCUMENTINTELLIGENCE_ENDPOINT"]
key = os.environ["DOCUMENTINTELLIGENCE_API_KEY"]

document_analysis_client = DocumentIntelligenceClient(endpoint=endpoint, credential=AzureKeyCredential(key))
url = "https://raw.githubusercontent.com/Azure/azure-sdk-for-python/main/sdk/documentintelligence/azure-ai-documentintelligence/tests/sample_forms/receipt/contoso-receipt.png"
poller = document_analysis_client.begin_analyze_document("prebuilt-receipt", AnalyzeDocumentRequest(url_source=url))
receipts = poller.result()

管理您的模型

管理附加至您帳戶的自訂模型。

from azure.ai.documentintelligence import DocumentIntelligenceAdministrationClient
from azure.core.credentials import AzureKeyCredential
from azure.core.exceptions import ResourceNotFoundError

endpoint = "https://<my-custom-subdomain>.cognitiveservices.azure.com/"
credential = AzureKeyCredential("<api_key>")

document_model_admin_client = DocumentIntelligenceAdministrationClient(endpoint, credential)

account_details = document_model_admin_client.get_resource_info()
print("Our account has {} custom models, and we can have at most {} custom models".format(
    account_details.custom_document_models.count, account_details.custom_document_models.limit
))

# Here we get a paged list of all of our models
models = document_model_admin_client.list_models()
print("We have models with the following ids: {}".format(
    ", ".join([m.model_id for m in models])
))

# Replace with the custom model ID from the "Build a model" sample
model_id = "<model_id from the Build a Model sample>"

custom_model = document_model_admin_client.get_model(model_id=model_id)
print("Model ID: {}".format(custom_model.model_id))
print("Description: {}".format(custom_model.description))
print("Model created on: {}\n".format(custom_model.created_on))

# Finally, we will delete this model by ID
document_model_admin_client.delete_model(model_id=custom_model.model_id)

try:
    document_model_admin_client.get_model(model_id=custom_model.model_id)
except ResourceNotFoundError:
    print("Successfully deleted model with id {}".format(custom_model.model_id))

附加元件功能

檔智慧支援更複雜的分析功能。視檔擷取的案例而定，可以啟用和停用這些選擇性功能。

下列附加元件功能適用于 2023-07-31 (GA) 和更新版本：

請注意，某些附加元件功能會產生額外費用。請參閱定價： https://azure.microsoft.com/pricing/details/ai-document-intelligence/ 。

疑難排解

一般

Document Intelligence 用戶端程式庫將會引發 Azure Core中定義的例外狀況。您可以在服務檔中找到檔智慧服務所引發的錯誤碼和訊息。

記錄

此程式庫會使用標準記錄程式庫進行記錄。

HTTP 會話的基本資訊 (URL、標頭等。) 會在層級記錄 INFO 。

詳細的 DEBUG 層級記錄，包括要求/回應主體和 未回應 標頭，可以在用戶端或每個作業 logging_enable 上使用關鍵字引數來啟用。

如需完整的 SDK 記錄檔，請參閱這裡的範例。

選用組態

選擇性關鍵字引數可以在用戶端和每個作業層級傳入。 azure 核心參考檔說明重試、記錄、傳輸通訊協定等可用的組態。

下一步

其他文件

如需 Azure AI 檔智慧的詳細資訊檔，請參閱 docs.microsoft.com 的檔智慧檔。

參與

此專案歡迎參與和提供建議。大部分的參與都要求您同意「參與者授權合約 (CLA)」，宣告您有權且確實授與我們使用投稿的權利。如需詳細資料，請前往 https://cla.microsoft.com 。

當您提交提取要求時，CLA Bot 會自動判斷您是否需要提供 CLA，並適當地裝飾 PR (例如標籤、註解)。請遵循 bot 提供的指示。您只需要使用我們的 CLA 在所有存放庫上執行此動作一次。

此專案採用 Microsoft Open Source Code of Conduct (Microsoft 開放原始碼管理辦法)。如需詳細資訊，請參閱管理辦法常見問題集，如有其他問題或意見，請連絡 opencode@microsoft.com。

Share via