Azure Search Index with both text and images

Maja Ru 25 Reputation points
2024-05-06T13:56:30.5666667+00:00

Hi,
I want to implement a RAG architecture where the input data is in pdfs and contain text, images and tables. Working only with text is straightforward and I've built such applications before. However the addition of images in the new use case makes me uncertain which approach I should use.
I have some pdfs which contain technical documentation. There is raw text, images and tables. Ideally the solution work like this (however i am not certain if it makes sense when it comes to Azure). I do not need any advanced algorithms that will analyze the pictures. I just want the pictures and tables to be returned from the search index as some kind of attachments to the text.
So for example a user asks a question about an error in the machine. Then I retrieve most relevant content chunk plus associated to it picture. But is this doable? Do i need to first myself extract the images and tables from the pdf? Or are there some Azure services that help me achieve this?

Second question: how do I return the pictures together with the text to the user. Which model do I use?

Azure AI Search
Azure AI Search
An Azure search service with built-in artificial intelligence capabilities that enrich information to help identify and explore relevant content at scale.
1,106 questions
Azure OpenAI Service
Azure OpenAI Service
An Azure service that provides access to OpenAI’s GPT-3 models with enterprise capabilities.
3,391 questions
Azure AI Document Intelligence
Azure AI Document Intelligence
An Azure service that turns documents into usable data. Previously known as Azure Form Recognizer.
1,775 questions
Azure AI services
Azure AI services
A group of Azure services, SDKs, and APIs designed to make apps more intelligent, engaging, and discoverable.
2,966 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Grmacjon-MSFT 18,646 Reputation points
    2024-05-08T23:27:06.11+00:00

    Hello @Maja Ru this is a way to achieve your scenario, this includes you tweaking code in the custom skill mentioned below to fit your needs:

    With built-in indexers (Indexer overview - Azure AI Search | Microsoft Learn) you can use a skillset (https://learn.microsoft.com/en-us/azure/search/cognitive-search-concept-intro, https://learn.microsoft.com/en-us/azure/search/cognitive-search-defining-skillset).

    The skillset can be built with this functionality:

    Extracting images and text steps, chunking (and vectorizing if needed):

    Extracting tabular data from docs:

    Use a custom skill with AI document intelligence to extract the tables: https://learn.microsoft.com/en-us/training/modules/build-form-recognizer-custom-skill-for-azure-cognitive-search/

     

    Write the enriched data to the index 

    Write the enrichment process data to the index with the respective skill outputs or if using chunking (split skill) with index projections (https://learn.microsoft.com/en-us/azure/search/index-projections-concept-intro?tabs=kstore-rest).

    There is no specific sample with the steps you require exactly as is, but you can run the "Import and vectorize data" wizard from the portal: https://learn.microsoft.com/en-us/azure/search/search-get-started-portal-import-vectors

    This will create the following configurations: data source, indexer,  skillset configuration (you can choose OCR so you have the first part described here -  and it will include chunking and vectorization described above) and an initial version of the index. After created, you can change the index fields where you plan to add the tabular data coming out of the custom skill and add the custom skill configuration to the skillset as described above. 

     

    Hope that helps. Let us know if you have further questions.

    Best,

    Grace


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.