共用方式為


DocumentAnalysisClient 類別

DocumentAnalysisClient 會分析檔和影像中的資訊,並分類檔。 其介面可用來分析預先建置的模型 (收據、名片、發票、身分識別檔等) 、分析檔配置、分析一般檔案類型,以及使用建置模型分析自訂檔, (查看服務支援的完整模型清單,請參閱: https://aka.ms/azsdk/formrecognizer/models) 。 它會根據來自 URL 的輸入和來自資料流程的輸入,提供不同的方法。

注意

DocumentAnalysisClient 應該與 API 版本搭配使用

2022-08-31 和增加。 若要使用 API 版本 < =v2.1,請具現化 FormRecognizerClient。

2022-08-31 版的新功能: DocumentAnalysisClient 及其用戶端方法。

繼承
azure.ai.formrecognizer._form_base_client.FormRecognizerClientBase
DocumentAnalysisClient

建構函式

DocumentAnalysisClient(endpoint: str, credential: AzureKeyCredential | TokenCredential, **kwargs: Any)

參數

endpoint
str
必要

支援的認知服務端點 (通訊協定和主機名稱,例如: https://westus2.api.cognitive.microsoft.com) 。

credential
AzureKeyCredentialTokenCredential
必要

用戶端連線到 Azure 所需的認證。 如果使用來自 的 API 金鑰或權杖認證 identity ,這是 AzureKeyCredential 的實例。

api_version
strDocumentAnalysisApiVersion

要用於要求之服務的 API 版本。 預設為最新的服務版本。 將 設定為較舊版本可能會導致功能相容性降低。 若要使用 API 版本 < =v2.1,請具現化 FormRecognizerClient。

範例

使用端點和 API 金鑰建立 DocumentAnalysisClient。


   from azure.core.credentials import AzureKeyCredential
   from azure.ai.formrecognizer import DocumentAnalysisClient

   endpoint = os.environ["AZURE_FORM_RECOGNIZER_ENDPOINT"]
   key = os.environ["AZURE_FORM_RECOGNIZER_KEY"]

   document_analysis_client = DocumentAnalysisClient(endpoint, AzureKeyCredential(key))

使用權杖認證建立 DocumentAnalysisClient。


   """DefaultAzureCredential will use the values from these environment
   variables: AZURE_CLIENT_ID, AZURE_TENANT_ID, AZURE_CLIENT_SECRET
   """
   from azure.ai.formrecognizer import DocumentAnalysisClient
   from azure.identity import DefaultAzureCredential

   endpoint = os.environ["AZURE_FORM_RECOGNIZER_ENDPOINT"]
   credential = DefaultAzureCredential()

   document_analysis_client = DocumentAnalysisClient(endpoint, credential)

方法

begin_analyze_document

分析指定檔中的欄位文字和語意值。

2023-07-31 版的新功能: features 關鍵字引數。

begin_analyze_document_from_url

分析指定檔中的欄位文字和語意值。 輸入必須是要分析之檔的位置 (URL) 。

2023-07-31 版的新功能: features 關鍵字引數。

begin_classify_document

使用檔分類器來分類檔。 如需如何建置自訂分類器模型的詳細資訊,請參閱 https://aka.ms/azsdk/formrecognizer/buildclassifiermodel

2023-07-31 版的新功能: begin_classify_document 用戶端方法。

begin_classify_document_from_url

使用檔分類器分類指定的檔。 如需如何建置自訂分類器模型的詳細資訊,請參閱 https://aka.ms/azsdk/formrecognizer/buildclassifiermodel 。 輸入必須是要分類之檔的位置 (URL) 。

2023-07-31 版的新功能: begin_classify_document_from_url 用戶端方法。

close

DocumentAnalysisClient關閉會話。

send_request

使用用戶端的現有管線執行網路要求。

要求 URL 可以相對於基底 URL。 除非另有指定,否則用於要求的服務 API 版本與用戶端的版本相同。 在 API 2022-08-31 版和更新版本的用戶端上,支援在相對 URL 中覆寫用戶端的已設定 API 版本。 以任何 API 版本在用戶端上支援的絕對 URL 中覆寫。 如果回應為錯誤,這個方法就不會引發;若要引發例外狀況,請在傳回的回應物件上呼叫 raise_for_status () 。 如需如何使用此方法傳送自訂要求的詳細資訊,請參閱 https://aka.ms/azsdk/dpcodegen/python/send_request

begin_analyze_document

分析指定檔中的欄位文字和語意值。

2023-07-31 版的新功能: features 關鍵字引數。

begin_analyze_document(model_id: str, document: bytes | IO[bytes], **kwargs: Any) -> LROPoller[AnalyzeResult]

參數

model_id
str
必要

唯一模型識別碼可以當做字串傳入。 使用此選項可指定自訂模型識別碼或預先建置的模型識別碼。 您可以在這裡找到支援的預先建置模型識別碼: https://aka.ms/azsdk/formrecognizer/models

document
bytesIO[bytes]
必要

檔案資料流程或位元組。 如需服務支援的檔案類型,請參閱: https://aka.ms/azsdk/formrecognizer/supportedfiles

pages
str

多頁檔的自訂頁碼 (PDF/TIFF) 。 輸入您想要在結果中取得的頁面號碼和/或範圍。 對於一系列的頁面,請使用連字號,例如pages=「1-3,5-6」。 以逗號分隔每個頁碼或範圍。

locale
str

輸入檔的地區設定提示。 請參閱這裡支援的地區設定: https://aka.ms/azsdk/formrecognizer/supportedlocales

features
list[str]

要啟用的檔分析功能。

傳回

LROPoller 的實例。 在輪詢器物件上呼叫 result () 以傳回 AnalyzeResult

傳回類型

例外狀況

範例

分析發票。 如需更多範例,請參閱 samples 資料夾。


   from azure.core.credentials import AzureKeyCredential
   from azure.ai.formrecognizer import DocumentAnalysisClient

   endpoint = os.environ["AZURE_FORM_RECOGNIZER_ENDPOINT"]
   key = os.environ["AZURE_FORM_RECOGNIZER_KEY"]

   document_analysis_client = DocumentAnalysisClient(
       endpoint=endpoint, credential=AzureKeyCredential(key)
   )
   with open(path_to_sample_documents, "rb") as f:
       poller = document_analysis_client.begin_analyze_document(
           "prebuilt-invoice", document=f, locale="en-US"
       )
   invoices = poller.result()

   for idx, invoice in enumerate(invoices.documents):
       print(f"--------Analyzing invoice #{idx + 1}--------")
       vendor_name = invoice.fields.get("VendorName")
       if vendor_name:
           print(
               f"Vendor Name: {vendor_name.value} has confidence: {vendor_name.confidence}"
           )
       vendor_address = invoice.fields.get("VendorAddress")
       if vendor_address:
           print(
               f"Vendor Address: {vendor_address.value} has confidence: {vendor_address.confidence}"
           )
       vendor_address_recipient = invoice.fields.get("VendorAddressRecipient")
       if vendor_address_recipient:
           print(
               f"Vendor Address Recipient: {vendor_address_recipient.value} has confidence: {vendor_address_recipient.confidence}"
           )
       customer_name = invoice.fields.get("CustomerName")
       if customer_name:
           print(
               f"Customer Name: {customer_name.value} has confidence: {customer_name.confidence}"
           )
       customer_id = invoice.fields.get("CustomerId")
       if customer_id:
           print(
               f"Customer Id: {customer_id.value} has confidence: {customer_id.confidence}"
           )
       customer_address = invoice.fields.get("CustomerAddress")
       if customer_address:
           print(
               f"Customer Address: {customer_address.value} has confidence: {customer_address.confidence}"
           )
       customer_address_recipient = invoice.fields.get("CustomerAddressRecipient")
       if customer_address_recipient:
           print(
               f"Customer Address Recipient: {customer_address_recipient.value} has confidence: {customer_address_recipient.confidence}"
           )
       invoice_id = invoice.fields.get("InvoiceId")
       if invoice_id:
           print(
               f"Invoice Id: {invoice_id.value} has confidence: {invoice_id.confidence}"
           )
       invoice_date = invoice.fields.get("InvoiceDate")
       if invoice_date:
           print(
               f"Invoice Date: {invoice_date.value} has confidence: {invoice_date.confidence}"
           )
       invoice_total = invoice.fields.get("InvoiceTotal")
       if invoice_total:
           print(
               f"Invoice Total: {invoice_total.value} has confidence: {invoice_total.confidence}"
           )
       due_date = invoice.fields.get("DueDate")
       if due_date:
           print(f"Due Date: {due_date.value} has confidence: {due_date.confidence}")
       purchase_order = invoice.fields.get("PurchaseOrder")
       if purchase_order:
           print(
               f"Purchase Order: {purchase_order.value} has confidence: {purchase_order.confidence}"
           )
       billing_address = invoice.fields.get("BillingAddress")
       if billing_address:
           print(
               f"Billing Address: {billing_address.value} has confidence: {billing_address.confidence}"
           )
       billing_address_recipient = invoice.fields.get("BillingAddressRecipient")
       if billing_address_recipient:
           print(
               f"Billing Address Recipient: {billing_address_recipient.value} has confidence: {billing_address_recipient.confidence}"
           )
       shipping_address = invoice.fields.get("ShippingAddress")
       if shipping_address:
           print(
               f"Shipping Address: {shipping_address.value} has confidence: {shipping_address.confidence}"
           )
       shipping_address_recipient = invoice.fields.get("ShippingAddressRecipient")
       if shipping_address_recipient:
           print(
               f"Shipping Address Recipient: {shipping_address_recipient.value} has confidence: {shipping_address_recipient.confidence}"
           )
       print("Invoice items:")
       for idx, item in enumerate(invoice.fields.get("Items").value):
           print(f"...Item #{idx + 1}")
           item_description = item.value.get("Description")
           if item_description:
               print(
                   f"......Description: {item_description.value} has confidence: {item_description.confidence}"
               )
           item_quantity = item.value.get("Quantity")
           if item_quantity:
               print(
                   f"......Quantity: {item_quantity.value} has confidence: {item_quantity.confidence}"
               )
           unit = item.value.get("Unit")
           if unit:
               print(f"......Unit: {unit.value} has confidence: {unit.confidence}")
           unit_price = item.value.get("UnitPrice")
           if unit_price:
               unit_price_code = unit_price.value.code if unit_price.value.code else ""
               print(
                   f"......Unit Price: {unit_price.value}{unit_price_code} has confidence: {unit_price.confidence}"
               )
           product_code = item.value.get("ProductCode")
           if product_code:
               print(
                   f"......Product Code: {product_code.value} has confidence: {product_code.confidence}"
               )
           item_date = item.value.get("Date")
           if item_date:
               print(
                   f"......Date: {item_date.value} has confidence: {item_date.confidence}"
               )
           tax = item.value.get("Tax")
           if tax:
               print(f"......Tax: {tax.value} has confidence: {tax.confidence}")
           amount = item.value.get("Amount")
           if amount:
               print(
                   f"......Amount: {amount.value} has confidence: {amount.confidence}"
               )
       subtotal = invoice.fields.get("SubTotal")
       if subtotal:
           print(f"Subtotal: {subtotal.value} has confidence: {subtotal.confidence}")
       total_tax = invoice.fields.get("TotalTax")
       if total_tax:
           print(
               f"Total Tax: {total_tax.value} has confidence: {total_tax.confidence}"
           )
       previous_unpaid_balance = invoice.fields.get("PreviousUnpaidBalance")
       if previous_unpaid_balance:
           print(
               f"Previous Unpaid Balance: {previous_unpaid_balance.value} has confidence: {previous_unpaid_balance.confidence}"
           )
       amount_due = invoice.fields.get("AmountDue")
       if amount_due:
           print(
               f"Amount Due: {amount_due.value} has confidence: {amount_due.confidence}"
           )
       service_start_date = invoice.fields.get("ServiceStartDate")
       if service_start_date:
           print(
               f"Service Start Date: {service_start_date.value} has confidence: {service_start_date.confidence}"
           )
       service_end_date = invoice.fields.get("ServiceEndDate")
       if service_end_date:
           print(
               f"Service End Date: {service_end_date.value} has confidence: {service_end_date.confidence}"
           )
       service_address = invoice.fields.get("ServiceAddress")
       if service_address:
           print(
               f"Service Address: {service_address.value} has confidence: {service_address.confidence}"
           )
       service_address_recipient = invoice.fields.get("ServiceAddressRecipient")
       if service_address_recipient:
           print(
               f"Service Address Recipient: {service_address_recipient.value} has confidence: {service_address_recipient.confidence}"
           )
       remittance_address = invoice.fields.get("RemittanceAddress")
       if remittance_address:
           print(
               f"Remittance Address: {remittance_address.value} has confidence: {remittance_address.confidence}"
           )
       remittance_address_recipient = invoice.fields.get("RemittanceAddressRecipient")
       if remittance_address_recipient:
           print(
               f"Remittance Address Recipient: {remittance_address_recipient.value} has confidence: {remittance_address_recipient.confidence}"
           )

分析自訂檔。 如需更多範例,請參閱 samples 資料夾。


   from azure.core.credentials import AzureKeyCredential
   from azure.ai.formrecognizer import DocumentAnalysisClient

   endpoint = os.environ["AZURE_FORM_RECOGNIZER_ENDPOINT"]
   key = os.environ["AZURE_FORM_RECOGNIZER_KEY"]
   model_id = os.getenv("CUSTOM_BUILT_MODEL_ID", custom_model_id)

   document_analysis_client = DocumentAnalysisClient(
       endpoint=endpoint, credential=AzureKeyCredential(key)
   )

   # Make sure your document's type is included in the list of document types the custom model can analyze
   with open(path_to_sample_documents, "rb") as f:
       poller = document_analysis_client.begin_analyze_document(
           model_id=model_id, document=f
       )
   result = poller.result()

   for idx, document in enumerate(result.documents):
       print(f"--------Analyzing document #{idx + 1}--------")
       print(f"Document has type {document.doc_type}")
       print(f"Document has document type confidence {document.confidence}")
       print(f"Document was analyzed with model with ID {result.model_id}")
       for name, field in document.fields.items():
           field_value = field.value if field.value else field.content
           print(
               f"......found field of type '{field.value_type}' with value '{field_value}' and with confidence {field.confidence}"
           )

   # iterate over tables, lines, and selection marks on each page
   for page in result.pages:
       print(f"\nLines found on page {page.page_number}")
       for line in page.lines:
           print(f"...Line '{line.content}'")
       for word in page.words:
           print(f"...Word '{word.content}' has a confidence of {word.confidence}")
       if page.selection_marks:
           print(f"\nSelection marks found on page {page.page_number}")
           for selection_mark in page.selection_marks:
               print(
                   f"...Selection mark is '{selection_mark.state}' and has a confidence of {selection_mark.confidence}"
               )

   for i, table in enumerate(result.tables):
       print(f"\nTable {i + 1} can be found on page:")
       for region in table.bounding_regions:
           print(f"...{region.page_number}")
       for cell in table.cells:
           print(
               f"...Cell[{cell.row_index}][{cell.column_index}] has text '{cell.content}'"
           )
   print("-----------------------------------")

begin_analyze_document_from_url

分析指定檔中的欄位文字和語意值。 輸入必須是要分析之檔的位置 (URL) 。

2023-07-31 版的新功能: features 關鍵字引數。

begin_analyze_document_from_url(model_id: str, document_url: str, **kwargs: Any) -> LROPoller[AnalyzeResult]

參數

model_id
str
必要

唯一模型識別碼可以當做字串傳入。 使用此選項可指定自訂模型識別碼或預先建置的模型識別碼。 您可以在這裡找到支援的預先建置模型識別碼: https://aka.ms/azsdk/formrecognizer/models

document_url
str
必要

要分析的檔 URL。 輸入必須是有效的正確編碼 (,也就是編碼特殊字元,例如空格) 和可公開存取的 URL。 如需服務支援的檔案類型,請參閱: https://aka.ms/azsdk/formrecognizer/supportedfiles

pages
str

多頁檔的自訂頁碼 (PDF/TIFF) 。 輸入您想要在結果中取得的頁面號碼和/或範圍。 對於一系列的頁面,請使用連字號,例如pages=「1-3,5-6」。 以逗號分隔每個頁碼或範圍。

locale
str

輸入檔的地區設定提示。 請參閱這裡支援的地區設定: https://aka.ms/azsdk/formrecognizer/supportedlocales

features
list[str]

要啟用的檔分析功能。

傳回

LROPoller 的實例。 在輪詢器物件上呼叫 result () 以傳回 AnalyzeResult

傳回類型

例外狀況

範例

分析收據。 如需更多範例,請參閱 samples 資料夾。


   from azure.core.credentials import AzureKeyCredential
   from azure.ai.formrecognizer import DocumentAnalysisClient

   endpoint = os.environ["AZURE_FORM_RECOGNIZER_ENDPOINT"]
   key = os.environ["AZURE_FORM_RECOGNIZER_KEY"]

   document_analysis_client = DocumentAnalysisClient(
       endpoint=endpoint, credential=AzureKeyCredential(key)
   )
   url = "https://raw.githubusercontent.com/Azure/azure-sdk-for-python/main/sdk/formrecognizer/azure-ai-formrecognizer/tests/sample_forms/receipt/contoso-receipt.png"
   poller = document_analysis_client.begin_analyze_document_from_url(
       "prebuilt-receipt", document_url=url
   )
   receipts = poller.result()

   for idx, receipt in enumerate(receipts.documents):
       print(f"--------Analysis of receipt #{idx + 1}--------")
       print(f"Receipt type: {receipt.doc_type if receipt.doc_type else 'N/A'}")
       merchant_name = receipt.fields.get("MerchantName")
       if merchant_name:
           print(
               f"Merchant Name: {merchant_name.value} has confidence: "
               f"{merchant_name.confidence}"
           )
       transaction_date = receipt.fields.get("TransactionDate")
       if transaction_date:
           print(
               f"Transaction Date: {transaction_date.value} has confidence: "
               f"{transaction_date.confidence}"
           )
       if receipt.fields.get("Items"):
           print("Receipt items:")
           for idx, item in enumerate(receipt.fields.get("Items").value):
               print(f"...Item #{idx + 1}")
               item_description = item.value.get("Description")
               if item_description:
                   print(
                       f"......Item Description: {item_description.value} has confidence: "
                       f"{item_description.confidence}"
                   )
               item_quantity = item.value.get("Quantity")
               if item_quantity:
                   print(
                       f"......Item Quantity: {item_quantity.value} has confidence: "
                       f"{item_quantity.confidence}"
                   )
               item_price = item.value.get("Price")
               if item_price:
                   print(
                       f"......Individual Item Price: {item_price.value} has confidence: "
                       f"{item_price.confidence}"
                   )
               item_total_price = item.value.get("TotalPrice")
               if item_total_price:
                   print(
                       f"......Total Item Price: {item_total_price.value} has confidence: "
                       f"{item_total_price.confidence}"
                   )
       subtotal = receipt.fields.get("Subtotal")
       if subtotal:
           print(f"Subtotal: {subtotal.value} has confidence: {subtotal.confidence}")
       tax = receipt.fields.get("TotalTax")
       if tax:
           print(f"Total tax: {tax.value} has confidence: {tax.confidence}")
       tip = receipt.fields.get("Tip")
       if tip:
           print(f"Tip: {tip.value} has confidence: {tip.confidence}")
       total = receipt.fields.get("Total")
       if total:
           print(f"Total: {total.value} has confidence: {total.confidence}")
       print("--------------------------------------")

begin_classify_document

使用檔分類器來分類檔。 如需如何建置自訂分類器模型的詳細資訊,請參閱 https://aka.ms/azsdk/formrecognizer/buildclassifiermodel

2023-07-31 版的新功能: begin_classify_document 用戶端方法。

begin_classify_document(classifier_id: str, document: bytes | IO[bytes], **kwargs: Any) -> LROPoller[AnalyzeResult]

參數

classifier_id
str
必要

唯一的檔分類器識別碼可以當做字串傳入。

document
bytesIO[bytes]
必要

檔案資料流程或位元組。 如需服務支援的檔案類型,請參閱: https://aka.ms/azsdk/formrecognizer/supportedfiles

傳回

LROPoller 的實例。 在輪詢器物件上呼叫 result () 以傳回 AnalyzeResult

傳回類型

例外狀況

範例

分類檔。 如需更多範例,請參閱 samples 資料夾。


   from azure.core.credentials import AzureKeyCredential
   from azure.ai.formrecognizer import DocumentAnalysisClient

   endpoint = os.environ["AZURE_FORM_RECOGNIZER_ENDPOINT"]
   key = os.environ["AZURE_FORM_RECOGNIZER_KEY"]
   classifier_id = os.getenv("CLASSIFIER_ID", classifier_id)

   document_analysis_client = DocumentAnalysisClient(
       endpoint=endpoint, credential=AzureKeyCredential(key)
   )
   with open(path_to_sample_documents, "rb") as f:
       poller = document_analysis_client.begin_classify_document(
           classifier_id, document=f
       )
   result = poller.result()

   print("----Classified documents----")
   for doc in result.documents:
       print(
           f"Found document of type '{doc.doc_type or 'N/A'}' with a confidence of {doc.confidence} contained on "
           f"the following pages: {[region.page_number for region in doc.bounding_regions]}"
       )

begin_classify_document_from_url

使用檔分類器分類指定的檔。 如需如何建置自訂分類器模型的詳細資訊,請參閱 https://aka.ms/azsdk/formrecognizer/buildclassifiermodel 。 輸入必須是要分類之檔的位置 (URL) 。

2023-07-31 版的新功能: begin_classify_document_from_url 用戶端方法。

begin_classify_document_from_url(classifier_id: str, document_url: str, **kwargs: Any) -> LROPoller[AnalyzeResult]

參數

classifier_id
str
必要

唯一的檔分類器識別碼可以當做字串傳入。

document_url
str
必要

要分類的檔 URL。 輸入必須是有效的正確編碼 (,也就是編碼特殊字元,例如空格) ,以及其中一種支援格式的可公開存取 URL: https://aka.ms/azsdk/formrecognizer/supportedfiles

傳回

LROPoller 的實例。 在輪詢器物件上呼叫 result () 以傳回 AnalyzeResult

傳回類型

例外狀況

範例

分類檔。 如需更多範例,請參閱 samples 資料夾。


   from azure.core.credentials import AzureKeyCredential
   from azure.ai.formrecognizer import DocumentAnalysisClient

   endpoint = os.environ["AZURE_FORM_RECOGNIZER_ENDPOINT"]
   key = os.environ["AZURE_FORM_RECOGNIZER_KEY"]
   classifier_id = os.getenv("CLASSIFIER_ID", classifier_id)

   document_analysis_client = DocumentAnalysisClient(
       endpoint=endpoint, credential=AzureKeyCredential(key)
   )

   url = "https://raw.githubusercontent.com/Azure/azure-sdk-for-python/main/sdk/formrecognizer/azure-ai-formrecognizer/tests/sample_forms/forms/IRS-1040.pdf"

   poller = document_analysis_client.begin_classify_document_from_url(
       classifier_id, document_url=url
   )
   result = poller.result()

   print("----Classified documents----")
   for doc in result.documents:
       print(
           f"Found document of type '{doc.doc_type or 'N/A'}' with a confidence of {doc.confidence} contained on "
           f"the following pages: {[region.page_number for region in doc.bounding_regions]}"
       )

close

DocumentAnalysisClient關閉會話。

close() -> None

例外狀況

send_request

使用用戶端的現有管線執行網路要求。

要求 URL 可以相對於基底 URL。 除非另有指定,否則用於要求的服務 API 版本與用戶端的版本相同。 在 API 2022-08-31 版和更新版本的用戶端上,支援在相對 URL 中覆寫用戶端的已設定 API 版本。 以任何 API 版本在用戶端上支援的絕對 URL 中覆寫。 如果回應為錯誤,這個方法就不會引發;若要引發例外狀況,請在傳回的回應物件上呼叫 raise_for_status () 。 如需如何使用此方法傳送自訂要求的詳細資訊,請參閱 https://aka.ms/azsdk/dpcodegen/python/send_request

send_request(request: HttpRequest, *, stream: bool = False, **kwargs) -> HttpResponse

參數

request
HttpRequest
必要

您想要進行的網路要求。

stream
bool

是否要串流回應承載。 預設為 False。

傳回

網路通話的回應。 不會在您的回應上處理錯誤。

傳回類型

例外狀況