Share via

Azure document intelligence result differ for url source and local pdf file.

Anonymous
2024-12-16T10:35:48.1533333+00:00

I am using a custom trained model to read tables from pdf file. When I use this model to analyze documents (pdf files) from my local system and use python sdk to extract data like this the results are different from the other method where i first upload files to azure blob storage and generate their sas urls and then process them using analyze_from_url function of sdk.
Can you tell me what am i doing wrong while i am using local pdf files because I have need to just use pdf files for inference? I have provided the both methods i am using for your reference.

# for blob in blob_list:
#     blob_name = blob.name
#     sas_url = generate_sas_url(blob_name, storage_account_name, storage_account_key, 		       container_name)
#     print(f"Analyzing document: {blob_name}")
    
#     # Call Document Intelligence API with the SAS URL
#     poller = document_analysis_client.begin_analyze_document_from_url(model_id, sas_url)
with open(file_path, "rb") as document:
#         poller = document_analysis_client.begin_analyze_document(
#             model_id=model_id, 
#             document=document,
#         )
Azure Document Intelligence in Foundry Tools

1 answer

Sort by: Most helpful
  1. santoshkc 15,615 Reputation points Microsoft External Staff Moderator
    2024-12-16T13:29:13.8533333+00:00

    Hi @Muhammad Hamad Akram,

    Thank you for reaching out to Microsoft Q&A forum!

    The differences in results between analyzing local PDFs and those uploaded to Azure Blob Storage could be due to several factors:

    1. Files stored in Blob Storage may have additional metadata or attributes (such as content type or caching settings) that could subtly affect the analysis results, while local files may lack this metadata.
    2. The Document Intelligence API might handle files from local paths and URLs differently, which could lead to variations in how the file is processed. Additionally, Blob Storage may benefit from optimizations specific to Azure's cloud infrastructure, such as improved processing speed or caching mechanisms, which might not be available when analyzing local files.
    3. If the files are large, they may be subject to timeouts or slower processing when accessed locally, while Blob Storage could handle large files more efficiently.

    I hope you understand! Thank you.

    Was this answer helpful?

    0 comments No comments

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.