Share via


Automate agent evaluations by using the Power Platform API

Copilot Studio provides makers with tools to continuously evaluate agent performance by running automated tests against predefined test sets using the Power Platform REST API. By using the REST API, you can programmatically trigger agent evaluations as part of your development workflows, such as during agent updates, release validation, or regression testing.

Automating evaluations helps you:

  • Validate agent quality after making changes
  • Run recurring performance checks against production or staging agents
  • Integrate agent testing into CI/CD pipelines
  • Detect regressions in agent behavior early in the development lifecycle

Prerequisites

  • You have the Bot ID and Environment ID for the target agent.
  • A test set created in Copilot Studio for your target agent.
  • A user access token issued by Microsoft Entra ID (OAuth 2.0). To obtain the token, see Authentication.
    • You must acquire the access token by using the client ID of an app registration that has the appropriate scope granted under the Power Platform API.
  • For Start an agent evaluation, you can optionally add a Microsoft Studio Connector ID to the call to use as a user profile for the evaluation. To find your mcsConnectionId:
    1. Go to Power Automate.
    2. Open the Connections page.
    3. Select the Microsoft Copilot Studio connection.
    4. Copy the mcsConnectionId from the URL: .../connections/shared_microsoftcopilotstudio/{mcsConnectionId}/details

Overview for running evaluations by using REST API

To run an evaluation by using the Power Platform API, follow these general steps:

  1. Fulfill the prerequisites.
  2. Find and retrieve the test set ID of the test set you want to use.
  3. Run the evaluation.
  4. Retrieve the results by using the evaluation run ID.

When the request is successful, the evaluation runs asynchronously and produces results that you can review in Copilot Studio.

API operations for automating evaluations

Copilot Studio supports REST API operations that you can use to programmatically trigger evaluations against your agent by using an existing test set.

Get agent test sets

  • Endpoint: GET https://api.powerplatform.com/copilotstudio/environments/{EnvironmentId}/bots/{BotId}/api/makerevaluation/testsets?api-version=2024-10-01
  • Purpose: Retrieve an array of the test set IDs and other details for a specific agent.
  • Response: Returns a list called value of test sets with the following information:
    • auditInfo: Timestamps and user IDs for creating and modifying each test set
    • displayName: The name of the test set.
    • id: The ID of the test set. Use in Start an agent evaluation to choose which test set to use.
    • description: The description of the test set.
    • state: The status of the test set. A usable test set is Active.
    • totalTestCases: The number of test cases within the test set.

Get agent test set details

  • Endpoint: GET https://api.powerplatform.com/copilotstudio/environments/{EnvironmentId}/bots/{BotId}/api/makerevaluation/testsets/{yourTestSetId}?api-version=2024-10-01
  • Purpose: Retrieve details for a specific test set, using the test set ID.
  • Response: Returns the information of one item in the Get agent test sets response array.

Learn more in Get Test Set Details API reference documentation.

Start an agent evaluation

  • Endpoint: GET https://api.powerplatform.com/copilotstudio/environments/{EnvironmentId}/bots/{BotId}/api/makerevaluation/testsets/{yourTestSetId}/run?api-version=2024-10-01
  • Purpose: Run an evaluation for a test set by using the test set's id. You can also include a user profile for authenticating connections during the evaluation run. Use mcsConnectionId to specify the user profile. If you don't add an mcsConnectionId to your call, the evaluation runs without authentication. See Prerequisites for how to find your MCS Connection ID.
  • Response: Returns the following information:
    • runId: The ID for the evaluation run. Use this ID to retrieve evaluation details.
    • lastUpdatedAt: When the run's status was last updated.
    • executionState: The run's status, while running the evaluation.
    • state: Current state of the run.
    • totalTestCases: Total number of test cases in the test set used for the evaluation.
    • testCasesProcessed: Total test cases evaluated as of the last update.

Get agent test run details

  • Endpoint: GET https://api.powerplatform.com/copilotstudio/environments/{EnvironmentId}/bots/{BotId}/api/makerevaluation/testruns/{yourTestRunId}?api-version=2024-10-01
  • Purpose: Retrieve the details of an evaluation by using the runId for your target evaluation run.
  • Response: Returns the following information:
    • id: The ID for the evaluation run. Use this ID to retrieve evaluation details.
    • environmentId: The ID for the environment of your agent.
    • cdsBotId: The ID for the target agent.
    • ownerId: The ID of the user who started the evaluation run.
    • testSetId: The ID of the test set used for the evaluation.
    • state: The progress status of the evaluation.
    • startTime: When the evaluation started.
    • endTime: When the evaluation completed (if it completed).
    • name: Name of the evaluation.
    • totalTestCases: Total test cases in the test set.
    • mcsConnectionId: The connection ID for Copilot Studio connection of the user profile used for the evaluation run. null if no user profile connected.
    • testCasesResults: The list of test cases in the evaluation run. Includes:
      • testCaseId: The ID of the test case.
      • state: The completion status of the test case.
      • metricsResults: The details and results for each test method used for the test case. Includes:
        • type: The test method.
        • result: The final result of the test for this test case. Includes:
          • data: The details of the result. Exact values depend on the test method. Learn more in the Power Platform API docs. For a general quality test, the response includes:
            • abstention: Whether the agent answered the query.
            • relevance: Whether the answer was relevant.
            • completeness: Whether the answer is complete.
        • status: The status of the test case.
        • errorReason: If there was an error, the cause of the error.
        • aiResultReason: The AI explanation of the test case result.

Get agent test runs

  • Endpoint: GET https://api.powerplatform.com/copilotstudio/environments/{EnvironmentId}/bots/{BotId}/api/makerevaluation/testruns?api-version=2024-10-01
  • Purpose: Retrieve an array of all previous runs.
  • Response: Each item in the array includes the same values as found in Get agent test run details.