Can I configure Azure Indexer to pass the entire content of PPTX and PDF documents—including text, images, tables, and graphs—through a single input field to my custom skillset for further processing with Python?

Question

Can I configure Azure Indexer to pass the entire content of PPTX and PDF documents—including text, images, tables, and graphs—through a single input field to my custom skillset for further processing with Python?

Choudhary, Mahika 0

Hi,

I have reports in both PPTX and PDF formats that contain text, images, tables, and various graphs. I am looking to pass the entire document content through a single input field to my custom skillset for further processing with Python code. While I am familiar with passing text and image data separately, I am wondering if it is possible to send the whole document at once using the Azure Indexer and skillset. Could you provide any guidance or insights on this?

custom skillset format:
{

"@odata.etag": "",

"name": "",

"description": " ",

"skills": [

{

"@odata.type": "#Microsoft.Skills.Custom.WebApiSkill",

"name": " ",

"context": "/document",

"uri": " ",

"httpMethod": "POST",

"timeout": "PT1M30S",

"batchSize": 1,

"inputs": [

{

"name": " document_content",

"source": "/document/*/data",

"inputs": []

}

],

"outputs": [

{

"name": "ocr_enhanced_data",

"targetName": "ocr_enhanced_data"

}

],

"httpHeaders": {}

}

],

"cognitiveServices": {

"@odata.type": " ",

"subdomainUrl": " "

}

1 answer

Your answer

Answer 1

Azar 29,520 MVP Volunteer Moderator

Hi there Choudhary, Mahika

Thanks for using QandA platform

I dont think the Search Indexers natively support passing the entire content of a PPTX or PDF document, through a single input field to a custom skillset. By default, the indexer extracts text and images separately, with text stored under /document/content and images under /document/normalized_images. Tables and graphs are not extracted as structured data, meaning a direct one-field input is not feasible.

maybe try modifying the custom skill to accept multiple inputs, such as both text content and images, allowing Python-based processing to merge them. Another option is preprocessing the documents before indexing using Azure Functions to convert the entire file into a Base64 string, which can then be passed as a single field to the custom skillset.

If this helps kindly accept the answer thanks much.

Choudhary, Mahika 0 Reputation points

2025-03-27T13:39:18.6933333+00:00

How can i preprocess the documents before indexing using Azure Functions to convert the entire file into a Base64 string, which can then be passed as a single field to the custom skillset? Could you provide any guidance or insights on this?
Alekhya Vaddepally 1,670 Reputation points Microsoft External Staff Moderator

2025-03-28T11:08:57.2433333+00:00

Hi Choudhary, Mahika,
You can preprocess the documents with Azure functions read the PDF, PPTX, or other formats, convert the entire file into base64 encoded string and pass that encoded string as single input field to the custom skillset.
Alekhya Vaddepally 1,670 Reputation points Microsoft External Staff Moderator

2025-03-31T08:37:09.5866667+00:00

HiChoudhary, Mahika,

We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet. In case if you have any resolution.

please do share that same with the community as it can be helpful to others. Otherwise, will respond with more details and we will try to help.
Alekhya Vaddepally 1,670 Reputation points Microsoft External Staff Moderator

2025-04-02T10:59:55.92+00:00

HiChoudhary, Mahika,

We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet. In case if you have any resolution.

please do share that same with the community as it can be helpful to others. Otherwise, will respond with more details and we will try to help.

Share via

Can I configure Azure Indexer to pass the entire content of PPTX and PDF documents—including text, images, tables, and graphs—through a single input field to my custom skillset for further processing with Python?

1 answer

Your answer