Azure document intelligence result differ for url source and local pdf file.

Muhammad Hamad Akram 0 Reputation points
2024-12-16T10:35:48.1533333+00:00

I am using a custom trained model to read tables from pdf file. When I use this model to analyze documents (pdf files) from my local system and use python sdk to extract data like this the results are different from the other method where i first upload files to azure blob storage and generate their sas urls and then process them using analyze_from_url function of sdk.
Can you tell me what am i doing wrong while i am using local pdf files because I have need to just use pdf files for inference? I have provided the both methods i am using for your reference.

# for blob in blob_list:
#     blob_name = blob.name
#     sas_url = generate_sas_url(blob_name, storage_account_name, storage_account_key, 		       container_name)
#     print(f"Analyzing document: {blob_name}")
    
#     # Call Document Intelligence API with the SAS URL
#     poller = document_analysis_client.begin_analyze_document_from_url(model_id, sas_url)
with open(file_path, "rb") as document:
#         poller = document_analysis_client.begin_analyze_document(
#             model_id=model_id, 
#             document=document,
#         )
Azure AI Document Intelligence
Azure AI Document Intelligence
An Azure service that turns documents into usable data. Previously known as Azure Form Recognizer.
1,975 questions
{count} votes

1 answer

Sort by: Most helpful
  1. santoshkc 13,360 Reputation points Microsoft External Staff
    2024-12-16T13:29:13.8533333+00:00

    Hi @Muhammad Hamad Akram,

    Thank you for reaching out to Microsoft Q&A forum!

    The differences in results between analyzing local PDFs and those uploaded to Azure Blob Storage could be due to several factors:

    1. Files stored in Blob Storage may have additional metadata or attributes (such as content type or caching settings) that could subtly affect the analysis results, while local files may lack this metadata.
    2. The Document Intelligence API might handle files from local paths and URLs differently, which could lead to variations in how the file is processed. Additionally, Blob Storage may benefit from optimizations specific to Azure's cloud infrastructure, such as improved processing speed or caching mechanisms, which might not be available when analyzing local files.
    3. If the files are large, they may be subject to timeouts or slower processing when accessed locally, while Blob Storage could handle large files more efficiently.

    I hope you understand! Thank you.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.