Edit

Evaluate the Agentic Retrieval in Foundry Local system

Evaluate the system, models, and datasets within Agentic Retrieval. There are three types of evaluations: baseline, automatic, and manual.

Important

Agentic Retrieval in Foundry Local is currently in PREVIEW. See the Supplemental Terms of Use for Microsoft Azure Previews for legal terms that apply to Azure features that are in beta, preview, or otherwise not yet released into general availability.

Prerequisites

Before you begin:

Run baseline check

The baseline check evaluates the functionality of the RAG system to make sure it's working as expected. It runs the following tasks:

  • Creates an ingestion build in the documents dataset.
  • Inferences by using the build of a test dataset that includes set of queries and expected answers.
  • Evaluates system based on model metrics.

To run a baseline check:

  1. Go to the developer portal using the domain name provided at deployment and app registration. For example: https://arcrag.contoso.com.

  2. Sign in with developer credentials that have both "EdgeRAGDeveloper" and "EdgeRAGEndUser" roles assigned.

  3. Select the Evaluation tab.

    A screenshot showing the Evaluation tab in the developer portal, highlighting options for running checks and managing evaluations.

  4. On the Baseline check tab, select Run a check.

  5. Enter a name for your evaluation.

    A screenshot showing the Evaluation tab in the developer portal, with options for running checks and managing evaluations.

  6. Select Run.

  7. Review the evaluation status.

  8. When the evaluation is completed, select the name to see the results.

    A screenshot showing the evaluation results, including metrics and detailed performance analysis of the RAG system.

Run automatic evaluation

The automatic evaluation evaluates the quality of the RAG system by using your own documents and dataset.

  1. In the developer portal, select Evaluation > Automatic evaluation.

    Screenshot of the Automatic Evaluation tab in the developer portal with options for creating evaluations.

  2. Select Create an automated evaluation.

  3. Enter a name for your evaluation.

    A screenshot of the basic information tab, with fields for entering the evaluation name and configuration options.

  4. Review the parameters like Temperature, Top-N, Top-P, and System prompt. These parameters are derived from the Chat playground. To change the parameters, go to the Chat tab and change them as needed.

  5. Select Next.

  6. Under Test dataset, select Download dataset sample to get familiar with the required structure of the test dataset JSONL format.

    Screenshot of the test dataset tab  where you can download a template and update the dataset.

  7. Upload your dataset JSONL file.

  8. Select Next.

  9. Select the Metrics you want to evaluate for your RAG system.

    A screenshot that shows the available metrics to evaluate your system.

  10. Select Next.

  11. Review the configurations and select Create.

    Screenshot of the tab that summarizes your configuration for the automatic evaluation.

  12. Monitor the progress and the status of the evaluation.

    Screenshot shows the results of an automatic evaluation, including metrics and evaluation details.

  13. After the evaluation completes, review the results by selecting the evaluation name.

  14. Review the evaluation details and metrics.