How do I use the Document intelligence API to analyze multiple documents at once?

Question

I am using a custom model and the script for analyzing one document is very straightforward. But, I am unable to do so for multiple documents at once. How do I analyze multiple documents at once using python?

Accepted Answer

Hi @Harsh Khewal,

Thank you for reaching out to Microsoft Q&A forum!

To analyze multiple documents at once using the Document Intelligence API and Python, I repro your issue by using the below python code:

from azure.core.credentials import AzureKeyCredential
from azure.ai.formrecognizer import DocumentAnalysisClient

endpoint = "YOUR_FORM_RECOGNIZER_ENDPOINT"
key = "YOUR_FORM_RECOGNIZER_KEY"

model_id = "YOUR_CUSTOM_BUILT_MODEL_ID"
formUrls = [  
    "URL1",
    "URL2",
    "URL3",
    # Add more URLs as needed
]

document_analysis_client = DocumentAnalysisClient(
    endpoint=endpoint, credential=AzureKeyCredential(key)
)

for formUrl in formUrls:
    poller = document_analysis_client.begin_analyze_document_from_url(model_id, formUrl)
    result = poller.result()

    for idx, document in enumerate(result.documents):
        print("--------Analyzing document #{}--------".format(idx + 1))
        print("Document has type {}".format(document.doc_type))
        print("Document has confidence {}".format(document.confidence))
        print("Document was analyzed by model with ID {}".format(result.model_id))
        for name, field in document.fields.items():
            field_value = field.value if field.value else field.content
            print("......found field of type '{}' with value '{}' and with confidence {}".format(field.value_type, field_value, field.confidence))

    # iterate over tables, lines, and selection marks on each page
    for page in result.pages:
        print("
Lines found on page {}".format(page.page_number))
        for line in page.lines:
            print("...Line '{}'".format(line.content.encode('utf-8')))
        for word in page.words:
            print(
                "...Word '{}' has a confidence of {}".format(
                    word.content.encode('utf-8'), word.confidence
                )
            )
        for selection_mark in page.selection_marks:
            print(
                "...Selection mark is '{}' and has a confidence of {}".format(
                    selection_mark.state, selection_mark.confidence
                )
            )

    for i, table in enumerate(result.tables):
        print("
Table {} can be found on page:".format(i + 1))
        for region in table.bounding_regions:
            print("...{}".format(i + 1, region.page_number))
        for cell in table.cells:
            print(
                "...Cell[{}][{}] has content '{}'".format(
                    cell.row_index, cell.column_index, cell.content.encode('utf-8')
                )
            )
    print("-----------------------------------")

Use pip installation if required. For more info, please look into the documentation.

I hope this helps. Thank you.

How do I use the Document intelligence API to analyze multiple documents at once?

0 additional answers