Share via

Issues with Multimodal RAG in Azure AI Search – Standalone Images Not Vectorizing & Indexer Not Picking Up New Docs

Santimoy Rana 20 Reputation points
2025-09-09T07:19:50.2933333+00:00

I’m building a multimodal RAG (Retrieval-Augmented Generation) system using Azure AI Search with vector embeddings for both text and images. While things are working partially, I’m stuck on two key issues:

  1. Standalone Images Not Getting Vectorized

When I upload PDF files containing embedded images, the images inside the PDFs are processed and vectorized correctly.

However, when I upload standalone images (e.g., .jpg or .png) directly into Azure Blob Storage, they don’t seem to get vectorized at all.

I’ve already tried adjusting the indexer JSON and modifying the skillset (including image analysis + embedding skills), but I couldn’t get standalone images to go through the embedding pipeline.

Question: Is there a known limitation with standalone images in AI Search vectorization pipelines, or am I missing a configuration step (e.g., skillset input/output mapping)?

  1. Indexer Not Picking Up Newly Added Documents

After adding new documents (PDFs or images) to the blob container, re-running the indexer doesn’t always detect or process them.

I was expecting incremental indexing to pick up the new files, but it seems inconsistent.

I tried resetting and re-running the indexer manually, but results didn’t change.

Question: What’s the recommended way to ensure newly added files in blob storage are reliably picked up and re-indexed? Do I need to reset the indexer each time, or is there a configuration to enable proper incremental updates?

What I’m Looking For

  • Guidance on how to properly configure the skillset + indexer so that standalone images are vectorized, similar to embedded images in PDFs.

Thanks in advance for any help or pointers!

Azure AI Search
Azure AI Search

An Azure search service with built-in artificial intelligence capabilities that enrich information to help identify and explore relevant content at scale.

0 comments No comments

Answer accepted by question author
  1. Gowtham CP 7,955 Reputation points Volunteer Moderator
    2025-09-09T08:57:01.9966667+00:00

    Hello Santimoy Rana,

    Thank you for reaching out on Microsoft Q&A.

    1. Standalone Images Not Getting Vectorized Azure AI Search does not embed raw .jpg or .png files directly. To vectorize them, you need to add an Image Analysis skill (to extract captions or text) and then pass that output into a Text Embedding skill. Finally, map the embeddings to a vector field in your index. Reference: Image Analysis skill

    2. Indexer Not Detecting New Files Incremental indexing relies on the LastModified property in Blob Storage. If new files are not picked up:

    Confirm the blob has a valid LastModified update.

    Ensure the indexer has a schedule enabled.

    If still inconsistent, run a full reset of the indexer once, then rely on incremental updates going forward. Reference: Incremental indexing

    This approach ensures that both PDFs and standalone images are processed consistently.

    If the information is useful, please accept the answer and upvote it to assist other community members.

    1 person found this answer helpful.
    0 comments No comments

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.