Streaming large files with Document Intelligence Python SDK

Bogdan Pechounov 60 Reputation points
2024-09-30T15:08:14.79+00:00

Does using AnalyzeDocumentRequest create a JSON payload with binary data?

    async def get_analyze_result(self, document_data: bytes) -> AnalyzeResult:
        """
        Get markdown of a document
        """
    
        document_intelligence_client = DocumentIntelligenceClient(
            endpoint=self.document_intelligence_endpoint,
            credential=AzureKeyCredential(key=self.document_intelligence_key),
        )

        async with document_intelligence_client:
            poller = await document_intelligence_client.begin_analyze_document(
                analyze_request=AnalyzeDocumentRequest(
                    bytes_source=document_data),
                model_id="prebuilt-layout",
                output_content_format=ContentFormat.MARKDOWN,
            )

            analyze_result = await poller.result()
            return analyze_result

Samples

Does the following code stream the file without blocking the thread? (I don't think a BufferedReader has async methods)

with open(path_to_sample_documents, "rb") as f:
        poller = await document_intelligence_client.begin_analyze_document(
            model_id=model_id, analyze_request=f, content_type="application/octet-stream"
        )
    result: AnalyzeResult = await poller.result()
Azure AI Document Intelligence
Azure AI Document Intelligence
An Azure service that turns documents into usable data. Previously known as Azure Form Recognizer.
1,707 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. YutongTie-MSFT 52,596 Reputation points
    2024-10-01T00:24:36.9733333+00:00

    Hello Bogdan Pechounov

    Thanks for reaching out to us, Azure Document Intelligence support bytes source - https://learn.microsoft.com/en-us/python/api/azure-ai-documentintelligence/azure.ai.documentintelligence.models.analyzedocumentrequest?view=azure-python-preview

    Please refer to the sample from the sample repo - https://github.com/Azure/azure-sdk-for-python/blob/main/sdk/documentintelligence/azure-ai-documentintelligence/samples/async_samples/sample_analyze_invoices_from_bytes_source_async.py

    Please take a look and have a try. I hope it helps.

    Regards,

    Yutong

    -Please kindly accept the answer if you feel helpful to support the community, thanks a lot.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.