How do I use the Document intelligence API to analyze multiple documents at once?

Harsh Khewal 110 Reputation points
2024-04-26T05:04:28.37+00:00

I am using a custom model and the script for analyzing one document is very straightforward. But, I am unable to do so for multiple documents at once. How do I analyze multiple documents at once using python?

Azure AI Document Intelligence
Azure AI Document Intelligence
An Azure service that turns documents into usable data. Previously known as Azure Form Recognizer.
1,389 questions
{count} votes

Accepted answer
  1. santoshkc 4,425 Reputation points Microsoft Vendor
    2024-04-26T08:19:16.47+00:00

    Hi @Harsh Khewal,

    Thank you for reaching out to Microsoft Q&A forum!

    To analyze multiple documents at once using the Document Intelligence API and Python, I repro your issue by using the below python code:

    from azure.core.credentials import AzureKeyCredential
    from azure.ai.formrecognizer import DocumentAnalysisClient
    
    endpoint = "YOUR_FORM_RECOGNIZER_ENDPOINT"
    key = "YOUR_FORM_RECOGNIZER_KEY"
    
    model_id = "YOUR_CUSTOM_BUILT_MODEL_ID"
    formUrls = [  
        "URL1",
        "URL2",
        "URL3",
        # Add more URLs as needed
    ]
    
    document_analysis_client = DocumentAnalysisClient(
        endpoint=endpoint, credential=AzureKeyCredential(key)
    )
    
    for formUrl in formUrls:
        poller = document_analysis_client.begin_analyze_document_from_url(model_id, formUrl)
        result = poller.result()
    
        for idx, document in enumerate(result.documents):
            print("--------Analyzing document #{}--------".format(idx + 1))
            print("Document has type {}".format(document.doc_type))
            print("Document has confidence {}".format(document.confidence))
            print("Document was analyzed by model with ID {}".format(result.model_id))
            for name, field in document.fields.items():
                field_value = field.value if field.value else field.content
                print("......found field of type '{}' with value '{}' and with confidence {}".format(field.value_type, field_value, field.confidence))
    
        # iterate over tables, lines, and selection marks on each page
        for page in result.pages:
            print("\nLines found on page {}".format(page.page_number))
            for line in page.lines:
                print("...Line '{}'".format(line.content.encode('utf-8')))
            for word in page.words:
                print(
                    "...Word '{}' has a confidence of {}".format(
                        word.content.encode('utf-8'), word.confidence
                    )
                )
            for selection_mark in page.selection_marks:
                print(
                    "...Selection mark is '{}' and has a confidence of {}".format(
                        selection_mark.state, selection_mark.confidence
                    )
                )
    
        for i, table in enumerate(result.tables):
            print("\nTable {} can be found on page:".format(i + 1))
            for region in table.bounding_regions:
                print("...{}".format(i + 1, region.page_number))
            for cell in table.cells:
                print(
                    "...Cell[{}][{}] has content '{}'".format(
                        cell.row_index, cell.column_index, cell.content.encode('utf-8')
                    )
                )
        print("-----------------------------------")
    
    

    Use pip installation if required. For more info, please look into the documentation.

    I hope this helps. Thank you.

    2 people found this answer helpful.
    0 comments No comments

0 additional answers

Sort by: Most helpful