The begin_analyze_document method from the DocumentIntelligenceClient in azure.ai.documentintelligence is giving a TypeError for input format when working with PDF documents

Question

The begin_analyze_document method from the DocumentIntelligenceClient in azure.ai.documentintelligence is giving a TypeError for input format when working with PDF documents

Bathula, Umesh 20

I am trying to use the begin_analyze_document method from the DocumentIntelligenceClient in the azure.ai.documentintelligence module to extract information from PDF documents. To achieve this, I'm converting the PDF documents to byte format before passing them to the method. However, I'm encountering a TypeError with the following error message:

Session.request() got an unexpected keyword argument 'document'

I'm looking for help in resolving this issue. Here's the code I'm using:

with open("path.pdf", "rb") as file:
	document_bytes = file.read()
	poller = client.begin_analyze_document(model_id="prebuilt-layout", document=document_bytes)

Accepted answer

0 additional answers

Your answer

Answer 1

@Bathula, Umesh Welcome to Microsoft Q&A Forum, Thank you for posting your query here! . Please run the below pip install command:

python -m pip install azure-ai-documentintelligence

Then update the endpoint, key and path of your pdf file in below sample code.

I tested with below sample code and that worked fine.

from azure.core.credentials import AzureKeyCredential
from azure.ai.documentintelligence import DocumentIntelligenceClient
from azure.ai.documentintelligence.models import AnalyzeResult

endpoint = "https://XXXXXXXX.cognitiveservices.azure.com/"
key = "51b635cXXXXXXX3bd715dc8"

document_intelligence_client = DocumentIntelligenceClient(endpoint=endpoint, credential=AzureKeyCredential(key))
with open("MY_SAMPLE_PDF_FILE.pdf", "rb") as f:
    poller = document_intelligence_client.begin_analyze_document(
        "prebuilt-layout", analyze_request=f, content_type="application/octet-stream"
    )
result: AnalyzeResult = poller.result()

if result.styles and any([style.is_handwritten for style in result.styles]):
    print("Document contains handwritten content")
else:
    print("Document does not contain handwritten content")

for page in result.pages:
    print(f"----Analyzing layout from page #{page.page_number}----")
    print(f"Page has width: {page.width} and height: {page.height}, measured with unit: {page.unit}")

    if page.lines:
        for line_idx, line in enumerate(page.lines):
            print(
                f"...Line # {line_idx} and text '{line.content}' "
                f"within bounding polygon '{line.polygon}'"
            )

            

    if page.selection_marks:
        for selection_mark in page.selection_marks:
            print(
                f"Selection mark is '{selection_mark.state}' within bounding polygon "
                f"'{selection_mark.polygon}' and has a confidence of {selection_mark.confidence}"
            )

if result.tables:
    for table_idx, table in enumerate(result.tables):
        print(f"Table # {table_idx} has {table.row_count} rows and " f"{table.column_count} columns")
        if table.bounding_regions:
            for region in table.bounding_regions:
                print(f"Table # {table_idx} location on page: {region.page_number} is {region.polygon}")
        for cell in table.cells:
            print(f"...Cell[{cell.row_index}][{cell.column_index}] has text '{cell.content}'")
            if cell.bounding_regions:
                for region in cell.bounding_regions:
                    print(f"...content on page {region.page_number} is within bounding polygon '{region.polygon}'")

print("----------------------------------------")

Detailed sample code is available here:

https://learn.microsoft.com/en-us/python/api/overview/azure/ai-documentintelligence-readme?view=azure-python-preview#extract-layout

.

Hope this helps. If you have any follow-up questions, please let me know. I would be happy to help.

** Please do not forget to "Accept the answer” and “up-vote” wherever the information provided helps you, this can be beneficial to other community members.

Share via

The begin_analyze_document method from the DocumentIntelligenceClient in azure.ai.documentintelligence is giving a TypeError for input format when working with PDF documents

0 additional answers

Your answer