Hello habitoti,
Welcome to the Microsoft Q&A and thank you for posting your questions here.
I understand that you cannot get PDF download/response working for Doc. Intelligence.
This is not just to guide you on generating a searchable PDF, but to explain that the **Document Intelligence Studio (UI) currently does not support this feature. Searchable PDF generation is **only supported via the REST API, using a specific preview API version (2023-07-31-preview
or later) and only in certain Azure regions.
Therefore, when the content
field is missing or blank in the API response, it is likely due to an unsupported region or SDK version. Document Intelligence requires you to send a POST
request to the prebuilt read
model with outputContentFormat
set to "searchablePdf"
. This triggers Azure’s OCR engine to extract text and embed it into a new PDF.
The feature currently works in West Europe, East US, and South Central US. It may not work in regions like Germany West Central, as Microsoft has not yet enabled this feature in all areas.
So, for step-by-step guide:
Step 1: Use this version 2023-07-31-preview or newer in your REST API call. The outputContentFormat
property was introduced in this release.
Step 2: To send REST API request with outputContentFormat: "searchablePdf"
use:
POST https://<your-region>.api.cognitive.microsoft.com/formrecognizer/documentModels/prebuilt-read:analyze?api-version=2023-07-31-preview
Headers:
Ocp-Apim-Subscription-Key: <your-key>
Content-Type: application/json
Body:
{
"urlSource": "<public-url-to-your-pdf>",
"outputContentFormat": "searchablePdf"
}
Step 3: If the response includes a "content" field, it contains a base64-encoded searchable PDF. Save it using the following code:
import base64
with open("output.pdf", "wb") as f:
f.write(base64.b64decode(result["analyzeResult"]["content"]))
https://learn.microsoft.com/en-us/azure/ai-services/document-intelligence/overview
This will help you to poll the API and decode Base64 PDF.
Step 4: Some regions do not yet support outputContentFormat
. Check using Azure Region Feature Availability or contact support - https://azure.microsoft.com/en-us/support/plans/
Step 5: Also, you will need to be certain that you’re using a preview version of the SDK:
- Python SDK:
azure-ai-formrecognizer >= 3.3.0b1
Azure Form Recognizer – Python SDK - .NET SDK: Use
Azure.AI.FormRecognizer
preview version Azure Form Recognizer – .NET SDK Docs
OPTION 2: (As requested)
STEP 1: To clarify Python SDK Limitation, the current Python SDK (azure-ai-formrecognizer v1.0.2 or earlier) does not support outputContentFormat="searchablePdf".
So, you must use the REST API directly for now until Microsoft adds this capability to the SDK (only available in preview SDKs — not always GA).
STEP 2: Using REST API to upload Binary PDF to Get Searchable PDF: You can POST a PDF file directly (no public URL needed) to the prebuilt-read model like this: REST Endpoint Format (West Europe) in pgsql:
https://<your-resource-name>.cognitiveservices.azure.com/formrecognizer/documentModels/prebuilt-read:analyze?api-version=2023-07-31-preview&outputContentFormat=searchablePdf
Replace <your-resource-name> with your actual Azure DI instance name (visible in your Azure portal). This URL is region-bound, cognitiveservices.azure.com works across global regions (West Europe, East US, etc.).
STEP 3: Python Code (Using REST API Directly)
import requests
# Replace with your instance-specific values
endpoint = "https://<your-resource-name>.cognitiveservices.azure.com"
api_key = "<your-form-recognizer-api-key>"
url = f"{endpoint}/formrecognizer/documentModels/prebuilt-read:analyze?api-version=2023-07-31-preview&outputContentFormat=searchablePdf"
headers = {
"Ocp-Apim-Subscription-Key": api_key,
"Content-Type": "application/pdf"
}
# Upload the PDF as binary data
with open("input.pdf", "rb") as f:
data = f.read()
# Send POST request
response = requests.post(url, headers=headers, data=data)
# Check response status
if response.status_code != 202:
print("Request failed:", response.text)
exit()
# Poll the 'operation-location' URL to get the result
operation_url = response.headers["operation-location"]
# Poll until analysis completes
import time
while True:
result = requests.get(operation_url, headers={"Ocp-Apim-Subscription-Key": api_key})
result_json = result.json()
status = result_json.get("status")
if status in ["succeeded", "failed"]:
break
time.sleep(2)
if status == "succeeded":
# Download the base64-encoded searchable PDF
base64_pdf = result_json["analyzeResult"]["content"]
with open("output.pdf", "wb") as out:
import base64
out.write(base64.b64decode(base64_pdf))
print("Searchable PDF saved as output.pdf")
else:
print("Analysis failed:", result_json)
I hope this is helpful! Do not hesitate to let me know if you have any other questions or clarifications.
Please don't forget to close up the thread here by upvoting and accept it as an answer if it is helpful.