DocumentAnalysisClient Class

DocumentAnalysisClient analyzes information from documents and images, and classifies documents. It is the interface to use for analyzing with prebuilt models (receipts, business cards, invoices, identity documents, among others), analyzing layout from documents, analyzing general document types, and analyzing custom documents with built models (to see a full list of models supported by the service, see: https://aka.ms/azsdk/formrecognizer/models). It provides different methods based on inputs from a URL and inputs from a stream.

Note

DocumentAnalysisClient should be used with API versions

2022-08-31 and up. To use API versions <=v2.1, instantiate a FormRecognizerClient.

New in version 2022-08-31: The DocumentAnalysisClient and its client methods.

Constructor

DocumentAnalysisClient(endpoint: str, credential: AzureKeyCredential | TokenCredential, **kwargs: Any)

Parameters

Name	Description
endpoint Required	str Supported Cognitive Services endpoints (protocol and hostname, for example: https://westus2.api.cognitive.microsoft.com).
credential Required	AzureKeyCredential or TokenCredential Credentials needed for the client to connect to Azure. This is an instance of AzureKeyCredential if using an API key or a token credential from identity.

Keyword-Only Parameters

Name	Description
api_version	str or DocumentAnalysisApiVersion The API version of the service to use for requests. It defaults to the latest service version. Setting to an older version may result in reduced feature compatibility. To use API versions <=v2.1, instantiate a FormRecognizerClient.

Examples

Creating the DocumentAnalysisClient with an endpoint and API key.


   from azure.core.credentials import AzureKeyCredential
   from azure.ai.formrecognizer import DocumentAnalysisClient

   endpoint = os.environ["AZURE_FORM_RECOGNIZER_ENDPOINT"]
   key = os.environ["AZURE_FORM_RECOGNIZER_KEY"]

   document_analysis_client = DocumentAnalysisClient(endpoint, AzureKeyCredential(key))

Creating the DocumentAnalysisClient with a token credential.


   """DefaultAzureCredential will use the values from these environment
   variables: AZURE_CLIENT_ID, AZURE_TENANT_ID, AZURE_CLIENT_SECRET
   """
   from azure.ai.formrecognizer import DocumentAnalysisClient
   from azure.identity import DefaultAzureCredential

   endpoint = os.environ["AZURE_FORM_RECOGNIZER_ENDPOINT"]
   credential = DefaultAzureCredential()

   document_analysis_client = DocumentAnalysisClient(endpoint, credential)

Methods

begin_analyze_document	Analyze field text and semantic values from a given document. New in version 2023-07-31: The features keyword argument.
begin_analyze_document_from_url	Analyze field text and semantic values from a given document. The input must be the location (URL) of the document to be analyzed. New in version 2023-07-31: The features keyword argument.
begin_classify_document	Classify a document using a document classifier. For more information on how to build a custom classifier model, see https://aka.ms/azsdk/formrecognizer/buildclassifiermodel. New in version 2023-07-31: The begin_classify_document client method.
begin_classify_document_from_url	Classify a given document with a document classifier. For more information on how to build a custom classifier model, see https://aka.ms/azsdk/formrecognizer/buildclassifiermodel. The input must be the location (URL) of the document to be classified. New in version 2023-07-31: The begin_classify_document_from_url client method.
close	Close the DocumentAnalysisClient session.
send_request	Runs a network request using the client's existing pipeline. The request URL can be relative to the base URL. The service API version used for the request is the same as the client's unless otherwise specified. Overriding the client's configured API version in relative URL is supported on client with API version 2022-08-31 and later. Overriding in absolute URL supported on client with any API version. This method does not raise if the response is an error; to raise an exception, call raise_for_status() on the returned response object. For more information about how to send custom requests with this method, see https://aka.ms/azsdk/dpcodegen/python/send_request.

begin_analyze_document

Analyze field text and semantic values from a given document.

New in version 2023-07-31: The features keyword argument.

begin_analyze_document(model_id: str, document: bytes | IO[bytes], **kwargs: Any) -> LROPoller[AnalyzeResult]

Parameters

Name	Description
model_id Required	str A unique model identifier can be passed in as a string. Use this to specify the custom model ID or prebuilt model ID. Prebuilt model IDs supported can be found here: https://aka.ms/azsdk/formrecognizer/models
document Required	bytes or IO[bytes] File stream or bytes. For service supported file types, see: https://aka.ms/azsdk/formrecognizer/supportedfiles.

Keyword-Only Parameters

Name	Description
pages	str Custom page numbers for multi-page documents(PDF/TIFF). Input the page numbers and/or ranges of pages you want to get in the result. For a range of pages, use a hyphen, like pages="1-3, 5-6". Separate each page number or range with a comma.
locale	str Locale hint of the input document. See supported locales here: https://aka.ms/azsdk/formrecognizer/supportedlocales.
features	list[str] Document analysis features to enable.

Returns

Type	Description
LROPoller[AnalyzeResult]	An instance of an LROPoller. Call result() on the poller object to return a AnalyzeResult.

Exceptions

Type	Description
HttpResponseError

Examples

Analyze an invoice. For more samples see the samples folder.


   from azure.core.credentials import AzureKeyCredential
   from azure.ai.formrecognizer import DocumentAnalysisClient

   endpoint = os.environ["AZURE_FORM_RECOGNIZER_ENDPOINT"]
   key = os.environ["AZURE_FORM_RECOGNIZER_KEY"]

   document_analysis_client = DocumentAnalysisClient(
       endpoint=endpoint, credential=AzureKeyCredential(key)
   )
   with open(path_to_sample_documents, "rb") as f:
       poller = document_analysis_client.begin_analyze_document(
           "prebuilt-invoice", document=f, locale="en-US"
       )
   invoices = poller.result()

   for idx, invoice in enumerate(invoices.documents):
       print(f"--------Analyzing invoice #{idx + 1}--------")
       vendor_name = invoice.fields.get("VendorName")
       if vendor_name:
           print(
               f"Vendor Name: {vendor_name.value} has confidence: {vendor_name.confidence}"
           )
       vendor_address = invoice.fields.get("VendorAddress")
       if vendor_address:
           print(
               f"Vendor Address: {vendor_address.value} has confidence: {vendor_address.confidence}"
           )
       vendor_address_recipient = invoice.fields.get("VendorAddressRecipient")
       if vendor_address_recipient:
           print(
               f"Vendor Address Recipient: {vendor_address_recipient.value} has confidence: {vendor_address_recipient.confidence}"
           )
       customer_name = invoice.fields.get("CustomerName")
       if customer_name:
           print(
               f"Customer Name: {customer_name.value} has confidence: {customer_name.confidence}"
           )
       customer_id = invoice.fields.get("CustomerId")
       if customer_id:
           print(
               f"Customer Id: {customer_id.value} has confidence: {customer_id.confidence}"
           )
       customer_address = invoice.fields.get("CustomerAddress")
       if customer_address:
           print(
               f"Customer Address: {customer_address.value} has confidence: {customer_address.confidence}"
           )
       customer_address_recipient = invoice.fields.get("CustomerAddressRecipient")
       if customer_address_recipient:
           print(
               f"Customer Address Recipient: {customer_address_recipient.value} has confidence: {customer_address_recipient.confidence}"
           )
       invoice_id = invoice.fields.get("InvoiceId")
       if invoice_id:
           print(
               f"Invoice Id: {invoice_id.value} has confidence: {invoice_id.confidence}"
           )
       invoice_date = invoice.fields.get("InvoiceDate")
       if invoice_date:
           print(
               f"Invoice Date: {invoice_date.value} has confidence: {invoice_date.confidence}"
           )
       invoice_total = invoice.fields.get("InvoiceTotal")
       if invoice_total:
           print(
               f"Invoice Total: {invoice_total.value} has confidence: {invoice_total.confidence}"
           )
       due_date = invoice.fields.get("DueDate")
       if due_date:
           print(f"Due Date: {due_date.value} has confidence: {due_date.confidence}")
       purchase_order = invoice.fields.get("PurchaseOrder")
       if purchase_order:
           print(
               f"Purchase Order: {purchase_order.value} has confidence: {purchase_order.confidence}"
           )
       billing_address = invoice.fields.get("BillingAddress")
       if billing_address:
           print(
               f"Billing Address: {billing_address.value} has confidence: {billing_address.confidence}"
           )
       billing_address_recipient = invoice.fields.get("BillingAddressRecipient")
       if billing_address_recipient:
           print(
               f"Billing Address Recipient: {billing_address_recipient.value} has confidence: {billing_address_recipient.confidence}"
           )
       shipping_address = invoice.fields.get("ShippingAddress")
       if shipping_address:
           print(
               f"Shipping Address: {shipping_address.value} has confidence: {shipping_address.confidence}"
           )
       shipping_address_recipient = invoice.fields.get("ShippingAddressRecipient")
       if shipping_address_recipient:
           print(
               f"Shipping Address Recipient: {shipping_address_recipient.value} has confidence: {shipping_address_recipient.confidence}"
           )
       print("Invoice items:")
       for idx, item in enumerate(invoice.fields.get("Items").value):
           print(f"...Item #{idx + 1}")
           item_description = item.value.get("Description")
           if item_description:
               print(
                   f"......Description: {item_description.value} has confidence: {item_description.confidence}"
               )
           item_quantity = item.value.get("Quantity")
           if item_quantity:
               print(
                   f"......Quantity: {item_quantity.value} has confidence: {item_quantity.confidence}"
               )
           unit = item.value.get("Unit")
           if unit:
               print(f"......Unit: {unit.value} has confidence: {unit.confidence}")
           unit_price = item.value.get("UnitPrice")
           if unit_price:
               unit_price_code = unit_price.value.code if unit_price.value.code else ""
               print(
                   f"......Unit Price: {unit_price.value}{unit_price_code} has confidence: {unit_price.confidence}"
               )
           product_code = item.value.get("ProductCode")
           if product_code:
               print(
                   f"......Product Code: {product_code.value} has confidence: {product_code.confidence}"
               )
           item_date = item.value.get("Date")
           if item_date:
               print(
                   f"......Date: {item_date.value} has confidence: {item_date.confidence}"
               )
           tax = item.value.get("Tax")
           if tax:
               print(f"......Tax: {tax.value} has confidence: {tax.confidence}")
           amount = item.value.get("Amount")
           if amount:
               print(
                   f"......Amount: {amount.value} has confidence: {amount.confidence}"
               )
       subtotal = invoice.fields.get("SubTotal")
       if subtotal:
           print(f"Subtotal: {subtotal.value} has confidence: {subtotal.confidence}")
       total_tax = invoice.fields.get("TotalTax")
       if total_tax:
           print(
               f"Total Tax: {total_tax.value} has confidence: {total_tax.confidence}"
           )
       previous_unpaid_balance = invoice.fields.get("PreviousUnpaidBalance")
       if previous_unpaid_balance:
           print(
               f"Previous Unpaid Balance: {previous_unpaid_balance.value} has confidence: {previous_unpaid_balance.confidence}"
           )
       amount_due = invoice.fields.get("AmountDue")
       if amount_due:
           print(
               f"Amount Due: {amount_due.value} has confidence: {amount_due.confidence}"
           )
       service_start_date = invoice.fields.get("ServiceStartDate")
       if service_start_date:
           print(
               f"Service Start Date: {service_start_date.value} has confidence: {service_start_date.confidence}"
           )
       service_end_date = invoice.fields.get("ServiceEndDate")
       if service_end_date:
           print(
               f"Service End Date: {service_end_date.value} has confidence: {service_end_date.confidence}"
           )
       service_address = invoice.fields.get("ServiceAddress")
       if service_address:
           print(
               f"Service Address: {service_address.value} has confidence: {service_address.confidence}"
           )
       service_address_recipient = invoice.fields.get("ServiceAddressRecipient")
       if service_address_recipient:
           print(
               f"Service Address Recipient: {service_address_recipient.value} has confidence: {service_address_recipient.confidence}"
           )
       remittance_address = invoice.fields.get("RemittanceAddress")
       if remittance_address:
           print(
               f"Remittance Address: {remittance_address.value} has confidence: {remittance_address.confidence}"
           )
       remittance_address_recipient = invoice.fields.get("RemittanceAddressRecipient")
       if remittance_address_recipient:
           print(
               f"Remittance Address Recipient: {remittance_address_recipient.value} has confidence: {remittance_address_recipient.confidence}"
           )

Analyze a custom document. For more samples see the samples folder.


   from azure.core.credentials import AzureKeyCredential
   from azure.ai.formrecognizer import DocumentAnalysisClient

   endpoint = os.environ["AZURE_FORM_RECOGNIZER_ENDPOINT"]
   key = os.environ["AZURE_FORM_RECOGNIZER_KEY"]
   model_id = os.getenv("CUSTOM_BUILT_MODEL_ID", custom_model_id)

   document_analysis_client = DocumentAnalysisClient(
       endpoint=endpoint, credential=AzureKeyCredential(key)
   )

   # Make sure your document's type is included in the list of document types the custom model can analyze
   with open(path_to_sample_documents, "rb") as f:
       poller = document_analysis_client.begin_analyze_document(
           model_id=model_id, document=f
       )
   result = poller.result()

   for idx, document in enumerate(result.documents):
       print(f"--------Analyzing document #{idx + 1}--------")
       print(f"Document has type {document.doc_type}")
       print(f"Document has document type confidence {document.confidence}")
       print(f"Document was analyzed with model with ID {result.model_id}")
       for name, field in document.fields.items():
           field_value = field.value if field.value else field.content
           print(
               f"......found field of type '{field.value_type}' with value '{field_value}' and with confidence {field.confidence}"
           )

   # iterate over tables, lines, and selection marks on each page
   for page in result.pages:
       print(f"\nLines found on page {page.page_number}")
       for line in page.lines:
           print(f"...Line '{line.content}'")
       for word in page.words:
           print(f"...Word '{word.content}' has a confidence of {word.confidence}")
       if page.selection_marks:
           print(f"\nSelection marks found on page {page.page_number}")
           for selection_mark in page.selection_marks:
               print(
                   f"...Selection mark is '{selection_mark.state}' and has a confidence of {selection_mark.confidence}"
               )

   for i, table in enumerate(result.tables):
       print(f"\nTable {i + 1} can be found on page:")
       for region in table.bounding_regions:
           print(f"...{region.page_number}")
       for cell in table.cells:
           print(
               f"...Cell[{cell.row_index}][{cell.column_index}] has text '{cell.content}'"
           )
   print("-----------------------------------")

begin_analyze_document_from_url

Analyze field text and semantic values from a given document. The input must be the location (URL) of the document to be analyzed.

New in version 2023-07-31: The features keyword argument.

begin_analyze_document_from_url(model_id: str, document_url: str, **kwargs: Any) -> LROPoller[AnalyzeResult]

Parameters

Name	Description
model_id Required	str A unique model identifier can be passed in as a string. Use this to specify the custom model ID or prebuilt model ID. Prebuilt model IDs supported can be found here: https://aka.ms/azsdk/formrecognizer/models
document_url Required	str The URL of the document to analyze. The input must be a valid, properly encoded (i.e. encode special characters, such as empty spaces), and publicly accessible URL. For service supported file types, see: https://aka.ms/azsdk/formrecognizer/supportedfiles.

Keyword-Only Parameters

Name	Description
pages	str Custom page numbers for multi-page documents(PDF/TIFF). Input the page numbers and/or ranges of pages you want to get in the result. For a range of pages, use a hyphen, like pages="1-3, 5-6". Separate each page number or range with a comma.
locale	str Locale hint of the input document. See supported locales here: https://aka.ms/azsdk/formrecognizer/supportedlocales.
features	list[str] Document analysis features to enable.

Returns

Type	Description
LROPoller[AnalyzeResult]	An instance of an LROPoller. Call result() on the poller object to return a AnalyzeResult.

Exceptions

Type	Description
HttpResponseError

Examples

Analyze a receipt. For more samples see the samples folder.


   from azure.core.credentials import AzureKeyCredential
   from azure.ai.formrecognizer import DocumentAnalysisClient

   endpoint = os.environ["AZURE_FORM_RECOGNIZER_ENDPOINT"]
   key = os.environ["AZURE_FORM_RECOGNIZER_KEY"]

   document_analysis_client = DocumentAnalysisClient(
       endpoint=endpoint, credential=AzureKeyCredential(key)
   )
   url = "https://raw.githubusercontent.com/Azure/azure-sdk-for-python/main/sdk/formrecognizer/azure-ai-formrecognizer/tests/sample_forms/receipt/contoso-receipt.png"
   poller = document_analysis_client.begin_analyze_document_from_url(
       "prebuilt-receipt", document_url=url
   )
   receipts = poller.result()

   for idx, receipt in enumerate(receipts.documents):
       print(f"--------Analysis of receipt #{idx + 1}--------")
       print(f"Receipt type: {receipt.doc_type if receipt.doc_type else 'N/A'}")
       merchant_name = receipt.fields.get("MerchantName")
       if merchant_name:
           print(
               f"Merchant Name: {merchant_name.value} has confidence: "
               f"{merchant_name.confidence}"
           )
       transaction_date = receipt.fields.get("TransactionDate")
       if transaction_date:
           print(
               f"Transaction Date: {transaction_date.value} has confidence: "
               f"{transaction_date.confidence}"
           )
       if receipt.fields.get("Items"):
           print("Receipt items:")
           for idx, item in enumerate(receipt.fields.get("Items").value):
               print(f"...Item #{idx + 1}")
               item_description = item.value.get("Description")
               if item_description:
                   print(
                       f"......Item Description: {item_description.value} has confidence: "
                       f"{item_description.confidence}"
                   )
               item_quantity = item.value.get("Quantity")
               if item_quantity:
                   print(
                       f"......Item Quantity: {item_quantity.value} has confidence: "
                       f"{item_quantity.confidence}"
                   )
               item_price = item.value.get("Price")
               if item_price:
                   print(
                       f"......Individual Item Price: {item_price.value} has confidence: "
                       f"{item_price.confidence}"
                   )
               item_total_price = item.value.get("TotalPrice")
               if item_total_price:
                   print(
                       f"......Total Item Price: {item_total_price.value} has confidence: "
                       f"{item_total_price.confidence}"
                   )
       subtotal = receipt.fields.get("Subtotal")
       if subtotal:
           print(f"Subtotal: {subtotal.value} has confidence: {subtotal.confidence}")
       tax = receipt.fields.get("TotalTax")
       if tax:
           print(f"Total tax: {tax.value} has confidence: {tax.confidence}")
       tip = receipt.fields.get("Tip")
       if tip:
           print(f"Tip: {tip.value} has confidence: {tip.confidence}")
       total = receipt.fields.get("Total")
       if total:
           print(f"Total: {total.value} has confidence: {total.confidence}")
       print("--------------------------------------")

begin_classify_document

Classify a document using a document classifier. For more information on how to build a custom classifier model, see https://aka.ms/azsdk/formrecognizer/buildclassifiermodel.

New in version 2023-07-31: The begin_classify_document client method.

begin_classify_document(classifier_id: str, document: bytes | IO[bytes], **kwargs: Any) -> LROPoller[AnalyzeResult]

Parameters

Name	Description
classifier_id Required	str A unique document classifier identifier can be passed in as a string.
document Required	bytes or IO[bytes] File stream or bytes. For service supported file types, see: https://aka.ms/azsdk/formrecognizer/supportedfiles.

Returns

Type	Description
LROPoller[AnalyzeResult]	An instance of an LROPoller. Call result() on the poller object to return a AnalyzeResult.

Exceptions

Type	Description
HttpResponseError

Examples

Classify a document. For more samples see the samples folder.


   from azure.core.credentials import AzureKeyCredential
   from azure.ai.formrecognizer import DocumentAnalysisClient

   endpoint = os.environ["AZURE_FORM_RECOGNIZER_ENDPOINT"]
   key = os.environ["AZURE_FORM_RECOGNIZER_KEY"]
   classifier_id = os.getenv("CLASSIFIER_ID", classifier_id)

   document_analysis_client = DocumentAnalysisClient(
       endpoint=endpoint, credential=AzureKeyCredential(key)
   )
   with open(path_to_sample_documents, "rb") as f:
       poller = document_analysis_client.begin_classify_document(
           classifier_id, document=f
       )
   result = poller.result()

   print("----Classified documents----")
   for doc in result.documents:
       print(
           f"Found document of type '{doc.doc_type or 'N/A'}' with a confidence of {doc.confidence} contained on "
           f"the following pages: {[region.page_number for region in doc.bounding_regions]}"
       )

begin_classify_document_from_url

Classify a given document with a document classifier. For more information on how to build a custom classifier model, see https://aka.ms/azsdk/formrecognizer/buildclassifiermodel. The input must be the location (URL) of the document to be classified.

New in version 2023-07-31: The begin_classify_document_from_url client method.

begin_classify_document_from_url(classifier_id: str, document_url: str, **kwargs: Any) -> LROPoller[AnalyzeResult]

Parameters

Name	Description
classifier_id Required	str A unique document classifier identifier can be passed in as a string.
document_url Required	str The URL of the document to classify. The input must be a valid, properly encoded (i.e. encode special characters, such as empty spaces), and publicly accessible URL of one of the supported formats: https://aka.ms/azsdk/formrecognizer/supportedfiles.

Returns

Type	Description
LROPoller[AnalyzeResult]	An instance of an LROPoller. Call result() on the poller object to return a AnalyzeResult.

Exceptions

Type	Description
HttpResponseError

Examples

Classify a document. For more samples see the samples folder.


   from azure.core.credentials import AzureKeyCredential
   from azure.ai.formrecognizer import DocumentAnalysisClient

   endpoint = os.environ["AZURE_FORM_RECOGNIZER_ENDPOINT"]
   key = os.environ["AZURE_FORM_RECOGNIZER_KEY"]
   classifier_id = os.getenv("CLASSIFIER_ID", classifier_id)

   document_analysis_client = DocumentAnalysisClient(
       endpoint=endpoint, credential=AzureKeyCredential(key)
   )

   url = "https://raw.githubusercontent.com/Azure/azure-sdk-for-python/main/sdk/formrecognizer/azure-ai-formrecognizer/tests/sample_forms/forms/IRS-1040.pdf"

   poller = document_analysis_client.begin_classify_document_from_url(
       classifier_id, document_url=url
   )
   result = poller.result()

   print("----Classified documents----")
   for doc in result.documents:
       print(
           f"Found document of type '{doc.doc_type or 'N/A'}' with a confidence of {doc.confidence} contained on "
           f"the following pages: {[region.page_number for region in doc.bounding_regions]}"
       )

close

Close the DocumentAnalysisClient session.

close() -> None

send_request

Runs a network request using the client's existing pipeline.

The request URL can be relative to the base URL. The service API version used for the request is the same as the client's unless otherwise specified. Overriding the client's configured API version in relative URL is supported on client with API version 2022-08-31 and later. Overriding in absolute URL supported on client with any API version. This method does not raise if the response is an error; to raise an exception, call raise_for_status() on the returned response object. For more information about how to send custom requests with this method, see https://aka.ms/azsdk/dpcodegen/python/send_request.

send_request(request: HttpRequest, *, stream: bool = False, **kwargs) -> HttpResponse

Parameters

Name	Description
request Required	HttpRequest The network request you want to make.

Keyword-Only Parameters

Name	Description
stream	bool Whether the response payload will be streamed. Defaults to False. Default value: False

Returns

Type	Description
HttpResponse	The response of your network call. Does not do error handling on your response.

フィードバック

このページはお役に立ちましたか?