Formerly known as Azure AI Services or Azure Cognitive Services is a unified collection of prebuilt AI capabilities within the Microsoft Foundry platform
Parul Paul hi,
if ur RAG pipeline is fully custom and not built using Foundry’s built-in RAG or Agent features, then u should not select Agent or Model for evaluation. Those are for things deployed inside Foundry. In ur case u should use Dataset-based evaluation. The correct pattern is run ur full custom RAG pipeline externally (retrieval + generation) collect the outputs and then upload a structured evaluation dataset into Foundry.
Structure ur JSONL could be like this
{ "question": "What is the SLA for premium tier?", "context": "Retrieved chunks that were passed to the model...", "answer": "Model’s generated response", "ground_truth": "Expected correct answer" }
Then use Foundry evaluators such as
Groundedness (checks answer vs context)
Relevance (checks answer vs question)
Similarity\correctness (answer vs ground_truth)
Foundry does not need to execute ur pipeline. It only needs question context answer ground_truth (if available).
As I know, the best practice for custom RAG evaluation is always log the exact retrieved context used during generation store final model output exactly as returned keep evaluation dataset separate from training data use multiple evaluators not just similarity
So assume workflow for issue could be like run ur RAG > export results > convert to JSONL > upload dataset > run evaluation on dataset. Go ahead to check it.
Foundry is evaluating outputs, not orchestrating ur custom pipeline.
rgds,
Alex