Share via

Apply CRUD operations for Azure AI search from SAP SuccessFactors data

Nagavolu, Chandan (External) 20 Reputation points
2026-03-02T06:02:26.83+00:00

Hi All,

We are in the process of building an Azure Data Factory (ADF) pipeline to extract data from SAP SuccessFactors and perform a RAG-based workflow—extracting, chunking, embedding, and indexing the data into Azure AI Search.

In production, this pipeline is planned to have a schedule run for every 8 hours. However, we have observed that a full re-indexing operation in Azure AI Search currently takes 3–4 hours, which would make our application unavailable for nearly half of the cycle.

We would like guidance on the recommended approach for handling incremental data operations—specifically insert, update, and delete scenarios—in Azure AI Search.

Additionally, we need clarity on how Azure AI Search behaves when individual records are updated or deleted. Understanding the technical flow and best practices will help us design an optimal indexing strategy.

Any references, documentation, or technical guidance related to this type of implementation would be extremely helpful.

Regards,

Chandan

Azure AI Search
Azure AI Search

An Azure search service with built-in artificial intelligence capabilities that enrich information to help identify and explore relevant content at scale.

0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. Q&A Assist
    2026-03-02T06:02:47.7566667+00:00
    AI generated content. This question contains an answer created with AI from Q&A Assist. Learn more

    For this scenario, the indexing strategy should be designed around incremental updates rather than full re‑indexing, and the pipeline should be automated end‑to‑end so that inserts, updates, and deletes flow from SAP SuccessFactors into Azure AI Search with minimal downtime.

    Key design points for CRUD-style indexing

    1. Use an index that supports incremental refresh
    • The search index must support automatic updates or incremental refreshes when underlying data changes.
    • When the platform offers this natively (for example, via indexers with change detection), offload as much of the change tracking and retry logic as possible to that platform to reduce operational overhead.
    1. Automate the end‑to‑end pipeline
    • The data processing platform (ADF + any enrichment components) should:
      • Clean and normalize data from SuccessFactors.
      • Chunk documents for RAG (for example, one row per chunk in the index).
      • Generate embeddings (via Azure OpenAI or another model) and store them as vector fields.
    • In Azure AI Search, this can be orchestrated as part of an enrichment pipeline during indexing so that chunking and embedding happen in the same flow that writes to the index.
    1. Use scheduled, incremental indexer runs where possible
    • Indexers are designed to run on a schedule and provide best‑effort indexing over time.
    • For large data sources, initial indexing can take hours or days; scheduled runs allow progress to continue and errors to be retried automatically.
    • For vectorization scenarios, integrated vectorization with indexers can:
      • Pull changed data from the source.
      • Perform document cracking, optional AI enrichment, chunking, and vectorization.
      • Index the resulting chunks and embeddings.
    • Put the indexer on a frequent schedule (for example, every few minutes) so that changes from SuccessFactors are picked up incrementally instead of doing a full rebuild every 8 hours.
    1. Handling inserts, updates, and deletes

    Although the context does not detail the low‑level API semantics, the following patterns are supported by Azure AI Search and should be reflected in the design:

    • Inserts:
      • New records or new chunks are added as new documents in the index.
      • When using indexers with change detection, new rows in the source are detected and added automatically on the next scheduled run.
    • Updates:
      • When a record changes in the source, the corresponding document(s) in the index are updated.
      • With indexers and change tracking, the indexer resumes from the last checkpoint and processes only new or changed documents, so updates are applied without re‑indexing the entire corpus.
      • For RAG chunking, if a source record is split into multiple chunks, the enrichment pipeline should regenerate the affected chunks and embeddings and overwrite the existing documents for that record.
    • Deletes:
      • The index must be kept in sync when records are removed in SuccessFactors.
      • If the data source supports change detection for deletes, indexers can remove corresponding documents during scheduled runs.
      • If not, the pipeline must implement custom delete logic (for example, tracking deleted IDs and issuing delete operations against the index) so that stale chunks are not used for grounding.
    1. Behavior and reliability during indexing
    • Index performance and availability:
      • The search index is part of the real‑time path for RAG, so its reliability should mirror that of the model endpoint.
      • Choose capacity and scheduling so that background indexing (including vectorization) does not degrade query performance beyond acceptable limits. For example, if throughput drops during reindexing, schedule heavy jobs during off‑peak hours.
    • Large volumes and RAG chunk explosion:
      • RAG implementations often split each document into many chunks, significantly increasing index size.
      • Ensure the index can scale horizontally, maintain query performance under load, and store both raw content and metadata/enrichments.
    1. Orchestration and workflow management
    • Use an orchestrator (ADF, Databricks, etc.) to:
      • Define and schedule the extraction from SuccessFactors.
      • Coordinate chunking, enrichment, and embedding.
      • Push changes (inserts/updates/deletes) into Azure AI Search or trigger indexers.
    • The orchestrator should support job dependencies, monitoring, and flexible modification of workflows as requirements evolve.
    1. Minimizing downtime during re‑indexing
    • Instead of full re‑indexing every 8 hours, use:
      • Continuous or frequent incremental indexing (indexers or push APIs) to keep the index close to real time.
      • Index aliasing or swap capabilities so that, when a full rebuild is required (for schema changes or major reprocessing), a new index can be built in parallel and then swapped in with zero downtime.
    1. Vector search and hybrid retrieval
    • For RAG, the index must support vector search and ideally hybrid search (vector + keyword + filters):
      • Store embeddings for each chunk in vector fields.
      • Combine vector search with full‑text search and filters for better relevance.
    • Integrated vectorization with indexers simplifies the CRUD pipeline by eliminating a separate chunking/vectorization job and automatically handling batching and retries when the embedding endpoint is throttled.

    Summary implementation pattern for SAP SuccessFactors → RAG on Azure AI Search

    1. Extract changes from SuccessFactors every 8 hours (or more frequently) via ADF.
    2. In the data processing layer, clean, chunk, and enrich the changed records only.
    3. Use integrated vectorization/indexers where possible so that:
      • New/changed records are chunked and embedded.
      • Corresponding documents in the index are inserted or updated.
      • Deletes are propagated based on source change tracking or custom logic.
    4. Run indexers on a schedule and rely on their built‑in batching and retry behavior.
    5. For rare full rebuilds, create a new index in parallel and use aliasing/swap to avoid downtime.

    References:

    0 comments No comments

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.