Thanks for the question, Retrieval Augmented Generation or “RAG” is one of the most popular architectural patterns for building data-infusedLLM applications. Azure OpenAI Service on your data automates many of the components of this architecture (ingestion, chunking, deployment), allowing customers to rapidly build use cases involving enterprise search or knowledge retrieval. If you're building your own RAG implementation (rather than using the AOAI one) then you'll need to take more ownership over this process: Your document chunking pipeline will need to ensure it's storing the source URL as part of this process.
Chunking Data: Chunking is important when source documents are too large for the maximum input size imposed by models. You can use the Text Split skill for chunking. If your documents are too large, you must insert a chunking step into indexing and query workflows.
- Embedding Creation: Azure AI Search doesn’t host vectorization models, so one of your challenges is creating embeddings for query inputs and outputs. You can use any embedding model, but Azure OpenAI embeddings models are commonly used. Integrated vectorization, currently in preview, offers embedded vectorization.
- Hybrid Search: Hybrid search is a combination of full text and vector queries that execute against a search index that contains both searchable plain text content and generated embeddings. A hybrid query combines full text search and vector search.
- Skillsets Not Getting Indexed: If the SplitText and AzureOpenaiEmbedding skillsets are not getting indexed, it could be due to several reasons. It might be helpful to check the error logs for more specific information. Also, ensure that the skillsets are correctly defined and connected in your indexer.
- Using OCR, Key Phrase Extraction, and Merge Skill: The OCR skill recognizes printed and handwritten text in image files. The Key Phrase Extraction skill evaluates unstructured text and returns a list of key phrases. The Merge skill can be used to collate the results.
- Incremental Indexing: Incremental enrichment refers to the use of cached enrichments during skillset execution so that only new and changed skills and documents incur AI processing. The cache contains the output from document cracking, plus the outputs of each skill for every document.