Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
[This article is prerelease documentation and is subject to change.]
After running an evaluation, review the results to understand how well your agent performed across conversations. The results show quality scores and let you drill into individual conversations to see the agent's actual responses.
Note
This article reflects the new agent experience in Microsoft Copilot Studio, which is currently available as a production-ready preview. Learn about the two experiences in Classic vs. new agent experience.
- Production-ready previews are subject to supplemental terms of use.
- Some capabilities available in the classic experience aren't yet available in the new experience.
- Agents created in the new experience can't be converted to the classic experience.
View run results
- Open your agent in Copilot Studio.
- Select the Evaluate tab.
- Under recent results, select an evaluation with a prior test run to see its results.
The results show the following information:
Test run result: A table showing each conversation with the number of messages between user and agent, and the general quality score. Select a specific conversation to get more specific result details.
Evaluation summary: A summary of your overall evaluation across all test methods. It includes the following information:
- Score: The overall score of conversations that passed the general quality test method.
- Duration: The length of time it took to complete the evaluation.
- Test cases: The number of test cases in your overall evaluation.
- Data type: The type of test set. Note only the Conversation data type is available.
- User profile: The user profile that ran the evaluation.
Review an individual conversation (test case)
On the run results page, select a conversation to expand its details.
Review the Test case details:
- User messages: The test messages sent to the agent.
- Agent responses: What the agent actually responded.
- General quality: How the responses scored on the test method. This score is either Pass or Fail.
Use this information to identify where your agent needs improvement.
Compare runs
When you run the same evaluation multiple times, you can compare results to track progress:
- On the evaluation detail page, review the list of runs with their scores and timestamps.
- Compare scores across runs to see whether changes to your agent improved or degraded performance.
- Look for patterns. For example, if a particular category of conversations consistently scores low, that area might need additional instructions or knowledge.
Export results
You can export evaluation results for further analysis:
- On the run results page, select the three dots (…) > Export test results. Alternatively, you can select a specific evaluation, then the three dots (…) > Export test results in the top right corner.
- The results are downloaded as a CSV file that includes all conversations, responses, and scores.
Act on results
Based on your evaluation results:
- Low general quality scores: Review and refine your agent's instructions. See Configure agent details and instructions (preview).
- Missing or incorrect tool use: Check that tools have clear descriptions and that your agent's instructions mention when to use them. See Tools overview for agents (preview).
- Incorrect information: Verify that the relevant knowledge sources are added and configured correctly. See Add knowledge to an agent (preview).