Hello @Maja Ru this is a way to achieve your scenario, this includes you tweaking code in the custom skill mentioned below to fit your needs:
With built-in indexers (Indexer overview - Azure AI Search | Microsoft Learn) you can use a skillset (https://learn.microsoft.com/en-us/azure/search/cognitive-search-concept-intro, https://learn.microsoft.com/en-us/azure/search/cognitive-search-defining-skillset).
The skillset can be built with this functionality:
Extracting images and text steps, chunking (and vectorizing if needed):
- OCR skill (https://learn.microsoft.com/en-us/azure/search/cognitive-search-skill-ocr) and extracting the images and text and outputs to a merged intermediate output.
- Using split skill (https://learn.microsoft.com/en-us/azure/search/cognitive-search-skill-textsplit) to chunk the data that comes from OCR skill that would fit the LLM input max size.
- If needed you can consider using vectorizing the extracted outputs from the above and vectorize to have not only keyword but similarity/semantic search with an embedding skill such as https://learn.microsoft.com/en-us/azure/search/cognitive-search-skill-azure-openai-embedding
Extracting tabular data from docs:
Use a custom skill with AI document intelligence to extract the tables: https://learn.microsoft.com/en-us/training/modules/build-form-recognizer-custom-skill-for-azure-cognitive-search/
Write the enriched data to the index
Write the enrichment process data to the index with the respective skill outputs or if using chunking (split skill) with index projections (https://learn.microsoft.com/en-us/azure/search/index-projections-concept-intro?tabs=kstore-rest).
There is no specific sample with the steps you require exactly as is, but you can run the "Import and vectorize data" wizard from the portal: https://learn.microsoft.com/en-us/azure/search/search-get-started-portal-import-vectors
This will create the following configurations: data source, indexer, skillset configuration (you can choose OCR so you have the first part described here - and it will include chunking and vectorization described above) and an initial version of the index. After created, you can change the index fields where you plan to add the tabular data coming out of the custom skill and add the custom skill configuration to the skillset as described above.
Hope that helps. Let us know if you have further questions.
Best,
Grace