How to send a local file to the REST API of AZURE DOCUMENT INTELLIGENCE

KEVIN S 20 Reputation points
2024-03-01T16:21:00.7266667+00:00

I need some help so basically I wen to the documentation for the rest api and I used

python in order to get a json response and got the data back

The issue is that I would like to use my own local storage pdf files in order to use DOCUMENT intelligence AI

basically I want to pass it a pdf from my C drive and use that data in order to let the intelligence ai do its thing. The reason is because if not then I would have to use another api like google drive or any other cloud software that allows me to pull in the files then pass those url files to the params

Let says I have 3 pdf files in my file explorer and those are the only three I would to pass it would not let me because url source wants a url of curse

curl -i -X POST "%FR_ENDPOINT%formrecognizer/documentModels/<modelId>:analyze?api-version=2023-07-31" -H "Content-Type: application/json" -H "Ocp-Apim-Subscription-Key: %FR_KEY%" --data-ascii "{'urlSource': '<document-url>'}"

Above is the curl example if you notice at the end it wants a urlsource is there any way to give it a local pdf instead?


data = "{'urlSource': 'url link to pdf file (github in this case)'}"
params = {
    'api-version': '2023-07-31',
}
response = requests.post(
    'url link to end point goes here',
    params=params,
    headers=headers,
    data=data,
)

https://learn.microsoft.com/en-us/azure/ai-services/document-intelligence/how-to-guides/use-sdk-rest-api?view=doc-intel-4.0.0&tabs=windows&pivots=programming-language-rest-api

Azure AI Document Intelligence
Azure AI Document Intelligence
An Azure service that turns documents into usable data. Previously known as Azure Form Recognizer.
1,405 questions
{count} votes

Accepted answer
  1. navba-MSFT 17,360 Reputation points Microsoft Employee
    2024-03-04T07:47:40.3266667+00:00

    @KEVIN S Welcome to Microsoft Q&A Forum, Thank you for posting your query here!

    Document Models - Analyze Document REST API clearly talks about the allowed request body headers:

    User's image

    More Info here.

    If you don't want to pass the urlSource then you can explore base64Source attribute in the request body.

    This can be used to pass the base64 content of the pdf file in c:\ drive. You can follow the below approach:

      
    import base64
    import requests
    import json
    
    # Read the PDF file in binary mode, encode it to base64, and decode to string
    with open("C:\\path\\to\\your\\file.pdf", "rb") as file:
        base64_encoded_pdf = base64.b64encode(file.read()).decode()
    
    # Prepare the API request body
    data = {
        "base64Source": base64_encoded_pdf
    }
    
    # Prepare the API request headers
    headers = {
        "Content-Type": "application/json",
        "Ocp-Apim-Subscription-Key": "<your_subscription_key>"
    }
    
    # Send the API request
    response = requests.post(
        "{endpoint}/formrecognizer/documentModels/{modelId}:analyze?pages={pages}&locale={locale}&stringIndexType={stringIndexType}&api-version=2023-07-31&features={features}",
        headers=headers,
        data=json.dumps(data),
    )
    
    # Print the API response
    print(response.json())
    
    

    Please note, I haven't tested the above sample at my end. Please test it at your end and check if that works fine.

    Please remember, the size of the base64 encoded string can be quite large for big PDF files, and there might be a limit on the size of the request body that the API can handle.

    Hope this helps. If you have any follow-up questions, please let me know. I would be happy to help.

    1 person found this answer helpful.

0 additional answers

Sort by: Most helpful