Import Blob Storage files into Document Intelligence

Malte Martienßen 65 Reputation points
2024-08-09T11:56:08.8+00:00

I would like Document Intelligence to import Files from Blob Storage, analyze them with the Layout Model with Markdown output and save the result as .json and additionally the content as .md file back into a blob storage container.

How do I do this with the Azure SDK in Python? I managed to import a sample PDF into DocInt from a link (not on blob storage) and save the output locally using DocumentIntelligenceClient.begin_analyze_document(...) from azure.ai.documentintelligence, but don't know how to connect DocInt and Blob Storage.

Can anyone help me with this?

Azure Blob Storage
Azure Blob Storage
An Azure service that stores unstructured data in the cloud as blobs.
3,192 questions
Azure AI Document Intelligence
Azure AI Document Intelligence
An Azure service that turns documents into usable data. Previously known as Azure Form Recognizer.
2,100 questions
0 comments No comments
{count} votes

Accepted answer
  1. Vinodh247 34,661 Reputation points MVP Volunteer Moderator
    2024-08-09T16:19:10.1166667+00:00

    Hi Malte Martienßen,

    Thanks for reaching out to Microsoft Q&A.

    • Set Up Blob Storage Connection
    • Retrieve the document you want to analyze by downloading the doc from blob storage
    • Use the DocumentIntelligenceClient to analyze the downloaded document
    • Extract the results and save them in both JSON and Markdown formats
    • Upload the generated JSON and Markdown files back to your Blob Storage container

    Following the above steps will allow you to seamlessly integrate Blob Storage with azure document Intelligence, enabling you to analyze documents and store the results efficiently.

    hth!

    Please 'Upvote'(Thumbs-up) and 'Accept' as an answer if the reply was helpful. This will benefit other community members who face the same issue.

    1 person found this answer helpful.

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.