I created an Agent using Copilot Studio using Copilot prompt and Evaluation is showing error.

Question

I created an Agent using Copilot Studio using Copilot prompt and Evaluation is showing error.

Edward 41

I went to Copilot Studio, used the (Copilot) prompt to create an Agent which set up Overview, Evaluation, and few others. When testing 'Evaluation' per suggestion, it says all 10 failed and the error is "Something went wrong".

It's always "Something went wrong". Every single time Microsoft has an issue, it's always "Something went wrong" and stays the same even if try shortly later. Never anything more specific but always "something went wrong" or i guess "try again later" which never works.

0 comments

2 answers

Your answer

Answer 1

Hello Edward,
The “Something went wrong” error in Copilot Studio Evaluation usually doesn’t mean the evaluation feature itself is broken—it typically means the system couldn’t generate an agent response during the evaluation run, so it had nothing to score. Evaluations first execute the agent and then assess the output, and if the agent fails at that first step (due to issues like broken user profile connections, authentication or permission problems, failing tools/connectors, content filtering, or stricter runtime limits), all test cases can show this generic failure. The quickest way to troubleshoot is to test the same questions in Test chat, run evaluation without user profile/auth/tools, and verify connections and permissions—because in most cases, the root cause is the agent not working correctly under evaluation conditions rather than a problem with evaluation itself.
Reference Document-
1.https://status.cloud.microsoft/
2.https://learn.microsoft.com/en-us/microsoft-copilot-studio/analytics-agent-evaluation-results
3.https://learn.microsoft.com/en-us/troubleshoot/power-platform/copilot-studio/welcome-copilot-studio
4.https://microsoft.github.io/L400-Copilot-and-Agents-at-work/Docs/04_EvaluateCopilotAgent/04.02.html
5.https://m365admin.handsontek.net/power-platform-monitor-announcing-copilot-studio-agent-health-metrics-alerts-public-preview/

Answer 2

The error "Something went wrong" during Evaluation typically indicates a generic failure in the underlying agent skills or evaluation pipeline rather than a problem with the test cases themselves. In the current tooling, these failures often surface as high-level messages instead of detailed diagnostics.

For Copilot Studio–based agents and evaluations, there are two main categories of issues relevant to this scenario:

Agent-side or skill-side failures
Security and Copilot agents can fail when their underlying skills or identities encounter issues. In those cases, the platform surfaces generic errors like:
- ProcessSkillEvaluationFailed – the agent’s underlying skill had an issue during evaluation.
- SecurityCopilotEvaluationError – an error occurred in a child evaluation.
- Identity-related errors such as ForbiddenIdentityConfiguration, UnauthorizedIdentityConfiguration, InvalidIdentityConfiguration, or MissingIdentityConfiguration when the identity used by the agent cannot obtain or use tokens correctly.
These are summarized to the user as a generic failure (for example, “Something went wrong”) instead of exposing detailed internal diagnostics. The documented guidance is:
- For skill/evaluation errors (ProcessSkillEvaluationFailed, SecurityCopilotEvaluationError): contact the agent owner.
- For identity configuration issues: verify the identity configuration in the agent, ensure the identity has all permissions listed in the manifest, and coordinate with the sign-in logs administrator to see if access is being blocked.
Evaluation setup or environment issues
Even when the agent is created from a Copilot prompt, evaluation can fail if:
- The evaluation setup is misconfigured (for example, grader or expected values are incorrect, or the evaluation method is not appropriate).
- The agent itself is not fully functional (for example, unpublished or misconfigured), so every evaluation test case fails.
The recommended triage path for agent evaluation failures is:
1. Verify the evaluation setup first:
  - Manually compare the agent’s actual response with the expected value and the grading method.
  - Confirm that the expected answers are current and accurate, the test cases are realistic, and the evaluation method (for example, compare meaning vs. text similarity) is appropriate.
  - If the agent’s responses are acceptable but still fail, the grader or expected values likely need adjustment.
2. If the evaluation setup is valid, diagnose the agent:
  - Ensure the agent is published and functioning. Unpublished agents can behave differently (for example, limited data processing or incomplete behavior) and cause evaluation failures.
  - Test the agent directly in Copilot Studio’s test pane to confirm it can respond correctly outside of evaluation.

Regarding why the message is so generic:
The platform intentionally abstracts many internal error details into a small set of user-facing messages (for example, “Something went wrong”, “Try again later”) and, where applicable, structured error codes (like the identity and skill errors above). The expectation is that administrators or agent makers use:

Error codes surfaced in the test pane or logs.
Sign-in logs and service logs for identity-related issues.
Evaluation triage guidance (verify evaluation setup, then agent behavior) to narrow down the root cause.

To move forward in this scenario:

Confirm the agent is published and functioning by testing it directly in Copilot Studio.
If evaluation still shows generic failures, inspect any error codes or identity-related messages associated with the agent (for example, Forbidden/Unauthorized/Invalid/Missing identity configuration) and correct the identity configuration as documented.
If the agent’s responses look correct but evaluations still fail, adjust the evaluation setup following the evaluation triage guidance.

References:

Share via

I created an Agent using Copilot Studio using Copilot prompt and Evaluation is showing error.

2 answers

Your answer