Can't get PDF download/response working for Doc. Intelligence

Question

Can't get PDF download/response working for Doc. Intelligence

habitoti 10

I am trying to download a searchable PDF from document intelligence. I believe all the necessary prerequisites are given (API version, feature selection etc.), but for the life of me the service doesn't offer any other response format than "text" and "markdown".
Also, just trying the test document in Document Intelligence Studioand selecting "SearchablePDF" as feature will not produce a correct response. It will only provide text in an application/json formatted response (also not including any additional PDF download link). I switched already from a German instance to "West Europe" (since Copilot assumed that only that instance within GDPR-reign will provide that feature at all), but it makes no difference. It's really hard to find any substantial information on that topic.
Did anyone get this going? Is it a matter of the instance in the end, and if so: where can I even look up which instance supports it?

1 answer

Your answer

Answer 1

Sina Salam 22,031 Volunteer Moderator

Hello habitoti,

Welcome to the Microsoft Q&A and thank you for posting your questions here.

I understand that you cannot get PDF download/response working for Doc. Intelligence.

This is not just to guide you on generating a searchable PDF, but to explain that the **Document Intelligence Studio (UI) currently does not support this feature. Searchable PDF generation is **only supported via the REST API, using a specific preview API version (2023-07-31-preview or later) and only in certain Azure regions.

Therefore, when the content field is missing or blank in the API response, it is likely due to an unsupported region or SDK version. Document Intelligence requires you to send a POST request to the prebuilt read model with outputContentFormat set to "searchablePdf". This triggers Azure’s OCR engine to extract text and embed it into a new PDF.

The feature currently works in West Europe, East US, and South Central US. It may not work in regions like Germany West Central, as Microsoft has not yet enabled this feature in all areas.

So, for step-by-step guide:

Step 1: Use this version 2023-07-31-preview or newer in your REST API call. The outputContentFormat property was introduced in this release.

https://learn.microsoft.com/en-us/rest/api/document-intelligence/document-models/analyze-document?tabs=HTTP#analyze-document

Step 2: To send REST API request with outputContentFormat: "searchablePdf" use:

POST https://<your-region>.api.cognitive.microsoft.com/formrecognizer/documentModels/prebuilt-read:analyze?api-version=2023-07-31-preview

Headers:

Ocp-Apim-Subscription-Key: <your-key>

Content-Type: application/json

Body:

{
  "urlSource": "<public-url-to-your-pdf>",
  "outputContentFormat": "searchablePdf"
}

https://learn.microsoft.com/en-us/azure/ai-services/document-intelligence/how-to-guides/ocr?tabs=preview-api#generate-searchable-pdfs

Step 3: If the response includes a "content" field, it contains a base64-encoded searchable PDF. Save it using the following code:

import base64
with open("output.pdf", "wb") as f:
    f.write(base64.b64decode(result["analyzeResult"]["content"]))

https://learn.microsoft.com/en-us/azure/ai-services/document-intelligence/overview

This will help you to poll the API and decode Base64 PDF.

Step 4: Some regions do not yet support outputContentFormat. Check using Azure Region Feature Availability or contact support - https://azure.microsoft.com/en-us/support/plans/

Step 5: Also, you will need to be certain that you’re using a preview version of the SDK:

Python SDK: azure-ai-formrecognizer >= 3.3.0b1 Azure Form Recognizer – Python SDK
.NET SDK: Use Azure.AI.FormRecognizer preview version Azure Form Recognizer – .NET SDK Docs

OPTION 2: (As requested)

STEP 1: To clarify Python SDK Limitation, the current Python SDK (azure-ai-formrecognizer v1.0.2 or earlier) does not support outputContentFormat="searchablePdf".

So, you must use the REST API directly for now until Microsoft adds this capability to the SDK (only available in preview SDKs — not always GA).

STEP 2: Using REST API to upload Binary PDF to Get Searchable PDF: You can POST a PDF file directly (no public URL needed) to the prebuilt-read model like this: REST Endpoint Format (West Europe) in pgsql:

https://<your-resource-name>.cognitiveservices.azure.com/formrecognizer/documentModels/prebuilt-read:analyze?api-version=2023-07-31-preview&outputContentFormat=searchablePdf

Replace <your-resource-name> with your actual Azure DI instance name (visible in your Azure portal). This URL is region-bound, cognitiveservices.azure.com works across global regions (West Europe, East US, etc.).

STEP 3: Python Code (Using REST API Directly)

import requests
# Replace with your instance-specific values
endpoint = "https://<your-resource-name>.cognitiveservices.azure.com"
api_key = "<your-form-recognizer-api-key>"
url = f"{endpoint}/formrecognizer/documentModels/prebuilt-read:analyze?api-version=2023-07-31-preview&outputContentFormat=searchablePdf"
headers = {
    "Ocp-Apim-Subscription-Key": api_key,
    "Content-Type": "application/pdf"
}
# Upload the PDF as binary data
with open("input.pdf", "rb") as f:
    data = f.read()
# Send POST request
response = requests.post(url, headers=headers, data=data)
# Check response status
if response.status_code != 202:
    print("Request failed:", response.text)
    exit()
# Poll the 'operation-location' URL to get the result
operation_url = response.headers["operation-location"]
# Poll until analysis completes
import time
while True:
    result = requests.get(operation_url, headers={"Ocp-Apim-Subscription-Key": api_key})
    result_json = result.json()
    status = result_json.get("status")
    if status in ["succeeded", "failed"]:
        break
    time.sleep(2)
if status == "succeeded":
    # Download the base64-encoded searchable PDF
    base64_pdf = result_json["analyzeResult"]["content"]
    with open("output.pdf", "wb") as out:
        import base64
        out.write(base64.b64decode(base64_pdf))
    print("Searchable PDF saved as output.pdf")
else:
    print("Analysis failed:", result_json)

I hope this is helpful! Do not hesitate to let me know if you have any other questions or clarifications.

Please don't forget to close up the thread here by upvoting and accept it as an answer if it is helpful.

Ravada Shivaprasad 545 Reputation points Microsoft External Staff Moderator

2025-05-13T20:47:36.07+00:00

Hi habitoti

Just checking in to see if the above answer provided by @Sina Salam helped.

If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

Thanks
habitoti 10 Reputation points

2025-05-14T04:43:51.89+00:00

Hi,

thanks for the comprehensive answer. I will check today if this works (couldn‘t do straight away due to timezone differences…).

In any case I‘d suggest to enhance the documentation in that direction as well. There is no mentioning of the regional limitations or vice versa: availability (other than maybe a very broad general statement on not all features being rolled out at the same time everywhere), and it says clearly that the searchablePDF feature is now GA with API 2024-11-30.
habitoti 10 Reputation points

2025-05-14T08:27:02.01+00:00
Hi Sina,
so I looked into this and still have a hard time getting this up & running. I do have some questions:

Your reference to the RestAPI Learning URL 404's unfortunately

I created a document intelligence instance in West Europe Region, so I assume the correct URL to use would be the below. But this doesn't work (404), and would also not work with "api.cognitive" instead of "cognitiveservices" (as in your sample). I am also not seeing an other specific API access URL in my Azure Dashboard for that instance.

https://<myInstance>.cognitiveservices.azure.com/formrecognizer/documentModels/prebuilt-read:analyze?api-version=2024-11-30

Your POST sample asks for a public URL to access the document to scan, which I obviously cannot provide (these are private documents). Currently I am using the DI feature to upload a PDF, have it scanned and just use the returned text (which is working fine). So is there no way to use the searchablePDF feature with an uploaded PDF?

You are asking for a specific Python API to support the PDF download, and I would actually love to go through such an API (as I do for my current approach of text-only...). However my understanding so far is that it ONLY works through direct REST API, so for what would I need that Python Lib at all?

Thanks, habitoti
Sina Salam 22,031 Reputation points Volunteer Moderator

2025-05-14T15:30:24.6833333+00:00
Hello

Thanks for your feedback.

I have identified in the answer that generating searchable PDFs is supported via REST API (not Studio), and only in certain regions using a preview API version (2023-07-31-preview or later).

Also, for clarifications:

Python SDK it's optional. I'm not asking for it, if you're using it, i suggest a full code snippet for you.

You do not need to provide a public URL to access the document to scan in this question but in your configuration or code. You can also avoid it — private documents cannot be shared via URL, so urlSource is impractical for real-world apps that process confidential files.

You must use the preview API version: 2023-07-31-preview or later.

outputContentFormat=searchablePdf works only in REST API or via preview SDKs (not UI).

Use file upload (not public URL) by sending PDF in the body with Content-Type: application/pdf.

Only works in supported regions: West Europe, East US, South Central US (check regional support).

About:

So, is there no way to use the searchablePDF feature with an uploaded PDF?

Yes, you can use the searchablePdf feature with a direct upload of the PDF file using a base64Source in the request body or via binary stream using multipart. **You *can* upload a PDF file (as binary content) to generate a searchable PDF.** You do **not** need a public URL. Instead of "urlSource", use the **binary file upload approach** with Content-Type: application/pdf`.

Good success.

Do not forget to upvote and accept it as an answer to others in the community.
habitoti 10 Reputation points

2025-05-15T12:18:04.5966667+00:00
I really tried a lot now (and it's not like me being a complete newbie here... ;-) ), but I can't get it working. I do have Document Intelligence instance in the West Europe region, so according to your information it should work in principle. Using the 1.0.2 Azure Python lib works with no issues for plain text download, but checking available formats just delivers "text" or "markdown", and "searchablePdf" as requested option throws an error. Going through the Python lib would generally be my favourite, so if you do have a working code snippet (which I couldn't find anywhere, and also Copilot can't help to construct sth. working...), this would be much appreciated.

That being said, I also tried a piece test coding for accessing fully through REST API, however already this URL just 404s, and I can't see what would be wrong with it:

https://<instance>.cognitiveservices.azure.com/formrecognizer/documentModels/prebuilt-read:analyze?api-version=2023-07-31-preview&outputContentFormat=searchablePdf

...with <instance> being my West Europe document Intelligence instance name. So I am actually stuck here as well. Not sure where the URL "https://<your-region>.api.cognitive.microsoft.com/..." you provided earlier should point to, and if <your-region> would be my instance-ID (none of that would conform to the endpoints provided in the Dashboard...), but whatever combination of that I'll try in addition, all end up 404 as well.

Again just reiterating: my instance is working properly in normal text retrieval mode, so it's not that I have wrong endpoints or API Keys -- these work just fine for other than PDF download...
Ravada Shivaprasad 545 Reputation points Microsoft External Staff Moderator

2025-05-15T15:45:22.0466667+00:00

Hi habitoti

Just checking in to see if the above answer helped. If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

Thanks
habitoti 10 Reputation points

2025-05-15T21:47:16.01+00:00

Please see my earlier reply at the end if the thread…
Ravada Shivaprasad 545 Reputation points Microsoft External Staff Moderator

2025-05-21T19:48:55.74+00:00

Hi habitoti
Sorry for Delayed Response!
It appears that you've diligently explored multiple approaches, but the issue with generating a searchable PDF in Azure Document Intelligence may stem from regional support constraints, API version compatibility, or incorrect endpoint formatting. The feature is supported in West Europe, East US, and South Central US, which aligns with your instance location. However, verifying that your request adheres to the 2023-07-31-preview API version is crucial since outputContentFormat: "searchablePdf" was introduced in this version. Additionally, the correct endpoint format must be utilized, following the structure https://.api.cognitive.microsoft.com/formrecognizer/documentModels/prebuilt-read:analyze?api-version=2023-07-31-preview, ensuring is replaced with West Europe rather than the instance name. If encountering persistent 404 errors, it may indicate that your instance configuration is incorrect or that the request parameters need refinement.

To proceed efficiently, using the Azure AI Document Intelligence SDK provides a robust solution. A viable Python implementation would involve initializing the DocumentIntelligenceClient with the appropriate endpoint and API key, followed by invoking the begin_analyze_document() method with outputContentFormat="searchablePdf". For reference, a well-structured API request through REST involves submitting a POST request with the required headers and payload format to the correct region-based endpoint. If inconsistencies persist, verifying instance provisioning, API key validity, and endpoint alignment is necessary.

Reference : Microsoft Q&A on Searchable PDF Issues, the official Azure Document Intelligence API Reference, and guidance on REST API Endpoint Troubleshooting.
If further clarification is required, I am here to refine the approach further.

Thanks
Ravada Shivaprasad 545 Reputation points Microsoft External Staff Moderator

2025-05-23T19:18:08.6666667+00:00

Hi habitoti

Just checking in to see if the below answer helped. If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

Thanks
habitoti 10 Reputation points

2025-05-24T07:03:06.8633333+00:00

Sorry to say that we are just circling here — the last reply is nothing but a collection of the few information that is available everywhere (and you are even providing this very thread as a solution reference — which it isn‘t!). Any AI will try to produce some working coding of those few available information pieces already, and I wouldn‘t be here if any of that would work. I do IT & dev. long enough to pickup information from all available sources before I actually go to a forum to ask.

If you can‘t provide a (link to a) full working (python) sample, either going the plain REST route or using the Azure python lib (preferably), I think there is not much more to discuss here. Please provide e.g. a full qualified URL for West-Europe region (no placeholders and asking me to replace sth.) and tell me where in the end my instance id goes (because it somehow has to be mapped to my DI instance — I doubt it can be mapped through the API key in the header alone, which would be a very weird practice also….).

My working fallback right now is extracting the text alone and doing my own text overlay into the PDF. For my use case at hand this is ok — I don‘t assign the overlay text to the very frames where it occurs (as DI would do), but also I don‘t need that here. Just need it for full PDF indexing down the pipeline. So I am not desperate yet, as it works that way. It would of course be better, faster and much more convenient to have DI do it (and also because I actually pay for the service and would think it should just do what it promises…).
Sina Salam 22,031 Reputation points Volunteer Moderator

2025-05-25T20:47:15.88+00:00
Dear habitoti,

Thank you for your feedback and asking for more.

I want to be clear with your request from the last comment, do you mean the followings:

You wants to use the Python SDK, not direct REST API, for consistent integration with existing logic. Despite the fact that the feature is missing in the stable SDK?

You are requesting an accurate, working code sample for:

REST API call to generate a searchable PDF via file upload (no public URL).

Clarification of correct endpoint format?

Also suggested improving documentation, especially around region availability, SDK limitations, and accurate endpoints?

if YES: Check the above main answer body for additional input (OPTION 2: (As requested))

I provide you with answer to:

Uses direct PDF binary upload, not public URL.

Uses correct West Europe endpoint (no guessing or placeholders).

Validates that content is retrieved and decoded to PDF.

No SDK needed but clean REST call for broader compatibility.

Success.
habitoti 10 Reputation points

2025-05-25T21:15:21.42+00:00
I‘d prefer using SDK (obviously — that‘s why we have those…), and you said further up it‘s optional, which suggests it should work. If it does not, that is ok for me, but I haven‘t seen such an information so far.

Yes

Yes
Ravada Shivaprasad 545 Reputation points Microsoft External Staff Moderator

2025-05-27T23:27:34.6933333+00:00
Hi habitoti

Thank you for your clarification and continued engagement.

SDK Preference Justified

You're absolutely right to prefer the Python SDK for consistency and maintainability. The SDK is designed to simplify integration and reduce boilerplate, and it’s the recommended approach for most production scenarios.
However, as of now, the stable Python SDK (azure-ai-formrecognizer) only supports API versions up to 2023-07-31. This means that newer features—such as generating searchable PDFs using the output=pdf parameter—are not yet available in the SDK.

“This package supports the following service API versions: 2.0, 2.1, 2022-08-31 and 2023-07-31. Service API version 2023-10-31-preview and later are supported in package azure-ai-documentintelligence. Please refer to this doc for migration details.” — Microsoft Learn: Azure Form Recognizer client library for Python

Why REST is Used Here

Since the output=pdf capability is only available in preview API versions (2023-10-31-preview or later), and those are not yet supported in the stable SDK, the only reliable way to access this feature is via a direct REST API call.

This is why I’ve provided a clean, working REST example that:
Uploads a binary PDF (not a public URL) and then Uses the correct West Europe endpoint with Targets the prebuilt-read model and also includes output=pdf in the query string.

import requests endpoint = "" api_key = "" model_id = "prebuilt-read" api_version = "2024-11-30" file_path = "C:/Users/Documents/Doctest/Taxdocs/scheduleE.pdf" url = f"{endpoint}/formrecognizer/documentModels/{model_id}:analyze?api-version={api_version}&output=pdf" with open(file_path, "rb") as f: file_data = f.read() headers = { "Content-Type": "application/pdf", "Ocp-Apim-Subscription-Key": api_key } response = requests.post(url, headers=headers, data=file_data) print(response.status_code) print(response.text)

if you need further I can help you with

Migrate to the azure-ai-documentintelligence preview SDK (which supports newer APIs),

Or wrap this REST logic into a reusable module that integrates cleanly with your SDK-based codebase.

Thanks
Ravada Shivaprasad 545 Reputation points Microsoft External Staff Moderator

2025-05-28T20:00:02.1866667+00:00

Hi habitoti

Just checking in to see if the above answer helped. If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

Thanks

Share via

Can't get PDF download/response working for Doc. Intelligence

1 answer

Your answer