Formerly known as Azure AI Services or Azure Cognitive Services is a unified collection of prebuilt AI capabilities within the Microsoft Foundry platform
Hi Kriti Kumari,
At the moment, the Foundry dashboard does not support a single consolidated view where you can add and compare evaluation results from multiple datasets or multiple evaluation runs in one persistent dashboard, like the external comparison site you shared.
What Foundry supports today is comparison at the evaluation run level, not a custom dashboard view.
To compare different models or datasets, you need to run evaluations separately for each dataset and model combination. After that, open the evaluation details page in the Foundry portal and select multiple evaluation runs. There is a built in Compare option that shows the results side by side. This comparison helps you see which model or dataset performed better and highlights improvements or regressions based on statistical significance. This comparison view works well for analysis, but it is temporary and not saved as a reusable dashboard.
If you want deeper insight, you can open each evaluation run and review the aggregated metrics and row level results. This shows the prompt, response, ground truth, and evaluator scores for each record, which is useful when you want to understand why one dataset or model performed better than another.
For agent-based scenarios, where you want to compare performance over time rather than offline datasets, the Monitor section of the agent gives you charts and metrics based on live traffic. This is meant for observing trends and behavior, not for direct dataset to dataset comparison.
So in short, comparison is possible today through the evaluation Compare feature, but adding multiple evaluation results into a single custom dashboard view is not available yet in Foundry. If your use case needs long term tracking or custom visuals, many users export evaluation results and build their own comparisons outside the portal.
Official documentation that explains these capabilities can be found here
• Self help for Evaluation setup & best practices:
https://learn.microsoft.com/azure/ai-foundry/concepts/observability?view=foundry-classic
• See evaluation results in Foundry portal & troubleshooting:
https://learn.microsoft.com/azure/ai-foundry/how-to/evaluate-results?view=foundry
• Compare the evaluation results (stat-sig side-by-side):
• Foundry Project REST API reference (evaluation_comparison): https://learn.microsoft.com/azure/foundry/reference/foundry-project-rest-preview#components
• Available tools / example prompts for Foundry MCP Server:
Hope this clarifies the current behavior and available options.
Thankyou!