Share via

Azure Search Index with both text and images

Maja Ru 25 Reputation points
2024-05-06T13:56:30.5666667+00:00

Hi,
I want to implement a RAG architecture where the input data is in pdfs and contain text, images and tables. Working only with text is straightforward and I've built such applications before. However the addition of images in the new use case makes me uncertain which approach I should use.
I have some pdfs which contain technical documentation. There is raw text, images and tables. Ideally the solution work like this (however i am not certain if it makes sense when it comes to Azure). I do not need any advanced algorithms that will analyze the pictures. I just want the pictures and tables to be returned from the search index as some kind of attachments to the text.
So for example a user asks a question about an error in the machine. Then I retrieve most relevant content chunk plus associated to it picture. But is this doable? Do i need to first myself extract the images and tables from the pdf? Or are there some Azure services that help me achieve this?

Second question: how do I return the pictures together with the text to the user. Which model do I use?

Azure AI Search
Azure AI Search

An Azure search service with built-in artificial intelligence capabilities that enrich information to help identify and explore relevant content at scale.

Azure OpenAI in Foundry Models
Azure Document Intelligence in Foundry Tools
Foundry Tools
Foundry Tools

Formerly known as Azure AI Services or Azure Cognitive Services is a unified collection of prebuilt AI capabilities within the Microsoft Foundry platform


1 answer

Sort by: Most helpful
  1. Grmacjon-MSFT 19,511 Reputation points Moderator
    2024-05-08T23:27:06.11+00:00

    Hello @Maja Ru this is a way to achieve your scenario, this includes you tweaking code in the custom skill mentioned below to fit your needs:

    With built-in indexers (Indexer overview - Azure AI Search | Microsoft Learn) you can use a skillset (https://learn.microsoft.com/en-us/azure/search/cognitive-search-concept-intro, https://learn.microsoft.com/en-us/azure/search/cognitive-search-defining-skillset).

    The skillset can be built with this functionality:

    Extracting images and text steps, chunking (and vectorizing if needed):

    Extracting tabular data from docs:

    Use a custom skill with AI document intelligence to extract the tables: https://learn.microsoft.com/en-us/training/modules/build-form-recognizer-custom-skill-for-azure-cognitive-search/

     

    Write the enriched data to the index 

    Write the enrichment process data to the index with the respective skill outputs or if using chunking (split skill) with index projections (https://learn.microsoft.com/en-us/azure/search/index-projections-concept-intro?tabs=kstore-rest).

    There is no specific sample with the steps you require exactly as is, but you can run the "Import and vectorize data" wizard from the portal: https://learn.microsoft.com/en-us/azure/search/search-get-started-portal-import-vectors

    This will create the following configurations: data source, indexer,  skillset configuration (you can choose OCR so you have the first part described here -  and it will include chunking and vectorization described above) and an initial version of the index. After created, you can change the index fields where you plan to add the tabular data coming out of the custom skill and add the custom skill configuration to the skillset as described above. 

     

    Hope that helps. Let us know if you have further questions.

    Best,

    Grace

    Was this answer helpful?


Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.