Can't get PDF download/response working for Doc. Intelligence

habitoti 10 Reputation points
2025-05-13T16:54:06.0766667+00:00

I am trying to download a searchable PDF from document intelligence. I believe all the necessary prerequisites are given (API version, feature selection etc.), but for the life of me the service doesn't offer any other response format than "text" and "markdown".
Also, just trying the test document in Document Intelligence Studioand selecting "SearchablePDF" as feature will not produce a correct response. It will only provide text in an application/json formatted response (also not including any additional PDF download link). I switched already from a German instance to "West Europe" (since Copilot assumed that only that instance within GDPR-reign will provide that feature at all), but it makes no difference. It's really hard to find any substantial information on that topic.
Did anyone get this going? Is it a matter of the instance in the end, and if so: where can I even look up which instance supports it?

Azure AI Document Intelligence
Azure AI Document Intelligence
An Azure service that turns documents into usable data. Previously known as Azure Form Recognizer.
2,117 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. Sina Salam 22,031 Reputation points Volunteer Moderator
    2025-05-13T18:59:05.2833333+00:00

    Hello habitoti,

    Welcome to the Microsoft Q&A and thank you for posting your questions here.

    I understand that you cannot get PDF download/response working for Doc. Intelligence.

    This is not just to guide you on generating a searchable PDF, but to explain that the **Document Intelligence Studio (UI) currently does not support this feature. Searchable PDF generation is **only supported via the REST API, using a specific preview API version (2023-07-31-preview or later) and only in certain Azure regions.

    Therefore, when the content field is missing or blank in the API response, it is likely due to an unsupported region or SDK version. Document Intelligence requires you to send a POST request to the prebuilt read model with outputContentFormat set to "searchablePdf". This triggers Azure’s OCR engine to extract text and embed it into a new PDF.

    The feature currently works in West Europe, East US, and South Central US. It may not work in regions like Germany West Central, as Microsoft has not yet enabled this feature in all areas.

    So, for step-by-step guide:

    Step 1: Use this version 2023-07-31-preview or newer in your REST API call. The outputContentFormat property was introduced in this release.

    Step 2: To send REST API request with outputContentFormat: "searchablePdf" use:

    POST https://<your-region>.api.cognitive.microsoft.com/formrecognizer/documentModels/prebuilt-read:analyze?api-version=2023-07-31-preview
    

    Headers:

    Ocp-Apim-Subscription-Key: <your-key>

    Content-Type: application/json

    Body:

    {
      "urlSource": "<public-url-to-your-pdf>",
      "outputContentFormat": "searchablePdf"
    }
    

    Step 3: If the response includes a "content" field, it contains a base64-encoded searchable PDF. Save it using the following code:

    import base64
    with open("output.pdf", "wb") as f:
        f.write(base64.b64decode(result["analyzeResult"]["content"]))
    

    https://learn.microsoft.com/en-us/azure/ai-services/document-intelligence/overview

    This will help you to poll the API and decode Base64 PDF.

    Step 4: Some regions do not yet support outputContentFormat. Check using Azure Region Feature Availability or contact support - https://azure.microsoft.com/en-us/support/plans/

    Step 5: Also, you will need to be certain that you’re using a preview version of the SDK:

    OPTION 2: (As requested)

    STEP 1: To clarify Python SDK Limitation, the current Python SDK (azure-ai-formrecognizer v1.0.2 or earlier) does not support outputContentFormat="searchablePdf".

    So, you must use the REST API directly for now until Microsoft adds this capability to the SDK (only available in preview SDKs — not always GA).

    STEP 2: Using REST API to upload Binary PDF to Get Searchable PDF: You can POST a PDF file directly (no public URL needed) to the prebuilt-read model like this: REST Endpoint Format (West Europe) in pgsql:

    https://<your-resource-name>.cognitiveservices.azure.com/formrecognizer/documentModels/prebuilt-read:analyze?api-version=2023-07-31-preview&outputContentFormat=searchablePdf
    

    Replace <your-resource-name> with your actual Azure DI instance name (visible in your Azure portal). This URL is region-bound, cognitiveservices.azure.com works across global regions (West Europe, East US, etc.).

    STEP 3: Python Code (Using REST API Directly)

    import requests
    # Replace with your instance-specific values
    endpoint = "https://<your-resource-name>.cognitiveservices.azure.com"
    api_key = "<your-form-recognizer-api-key>"
    url = f"{endpoint}/formrecognizer/documentModels/prebuilt-read:analyze?api-version=2023-07-31-preview&outputContentFormat=searchablePdf"
    headers = {
        "Ocp-Apim-Subscription-Key": api_key,
        "Content-Type": "application/pdf"
    }
    # Upload the PDF as binary data
    with open("input.pdf", "rb") as f:
        data = f.read()
    # Send POST request
    response = requests.post(url, headers=headers, data=data)
    # Check response status
    if response.status_code != 202:
        print("Request failed:", response.text)
        exit()
    # Poll the 'operation-location' URL to get the result
    operation_url = response.headers["operation-location"]
    # Poll until analysis completes
    import time
    while True:
        result = requests.get(operation_url, headers={"Ocp-Apim-Subscription-Key": api_key})
        result_json = result.json()
        status = result_json.get("status")
        if status in ["succeeded", "failed"]:
            break
        time.sleep(2)
    if status == "succeeded":
        # Download the base64-encoded searchable PDF
        base64_pdf = result_json["analyzeResult"]["content"]
        with open("output.pdf", "wb") as out:
            import base64
            out.write(base64.b64decode(base64_pdf))
        print("Searchable PDF saved as output.pdf")
    else:
        print("Analysis failed:", result_json)
    

    I hope this is helpful! Do not hesitate to let me know if you have any other questions or clarifications.


    Please don't forget to close up the thread here by upvoting and accept it as an answer if it is helpful.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.