Share via


FAQ for analytics

These frequently asked questions (FAQ) describe the AI effect of analytics assistance features in Copilot Studio.

How is generative AI used for analytics?

Copilot Studio uses AI to measure the quality of generative answer responses and to create clusters. These clusters provide insights into agent performance.

Generative answers use knowledge sources you choose to generate a response. The feature also collects any feedback you provide. Analytics use large language models (LLMs) to classify the chat messages between users and agents into levels that indicate the quality of generative answer responses. Copilot Studio compiles these indicators to give you a summary of an agent's overall performance.

Clustering uses LLMs to sort users' messages into groups based on shared subjects and provide each group with a descriptive name. Copilot Studio uses the names of these clusters to provide different types of insights you can use to improve your agent.

Quality of responses for generative answers

What is the quality of response intended use?

Use quality of response analytics to discover insights into agent usage and performance, and then create actions for agent improvement. Currently, you can use analytics to understand if the quality of an agent's generative answers meets your expectations.

In addition to overall quality, quality of response analytics identifies areas where an agent performs poorly or fails to perform your intended goals. You can define areas where generative answers perform poorly and take steps to improve their quality.

When identifying poor performance, follow best practices that can help improve quality. For example, after identifying knowledge sources with poor performance, you can edit the knowledge source or split the knowledge source into multiple, more focused sources for increased quality.

What data is used to create analytics for quality of response?

Quality of response analytics are calculated using a sample of generative answer responses. It requires the user query, the agent response, and the relevant knowledge sources that the generative model uses for the generative answer.

Quality of response analytics uses that information to evaluate if the generative answer quality is good, and if not, why the quality is poor. For example, quality of response can identify incomplete, irrelevant, or not fully grounded responses.

What are the limitations of quality of response analytics, and how can users minimize the impact of these limitations?

  • Quality of response analytics don't use all generative responses. Instead, analytics measures a sample of user-agent sessions. Agents with fewer than the minimum number of successful generative answers can't receive a quality of response analytical summary.

  • There are cases when analytics don't evaluate an individual response accurately. On an aggregated level, it should be accurate for most cases.

  • Quality of response analytics don't provide a breakdown of the specific queries that led to low quality performance. They also don't provide a breakdown of common knowledge sources or topics that were used when low quality responses occur.

  • Analytics aren't calculated for answers that use generative knowledge.

  • Answer completeness is one of the metrics used to assess response quality. This metric measures how fully the response addresses the content in the retrieved document.

    If the system doesn't retrieve a relevant document with additional information for the question, it doesn't evaluate the completeness metric for that document.

What protections are in place for quality of response analytics within Copilot Studio for responsible AI?

Users of agents don't see analytics results; they're available to agent makers and admins only.

Makers and admins can only use quality of response analytics to see the percentage of good quality responses and any predefined reasons for poor performance. Makers can only see the percentage of good quality responses and predefine reasons.

We tested analytics for quality of responses thoroughly during development to ensure good performance. However, on rare occurrences, quality of response assessments might be inaccurate.

Sentiment analysis for conversational sessions

What is the intended use of sentiment analysis?

Use sentiment analysis to understand the level of user satisfaction in conversation sessions based on an AI analysis of user messages to the agent. You can understand the overall sentiment of the session (positive, negative, or neutral), investigate the reasons, and take measures to address it.

What data is used to define sentiment in a conversational session?

Copilot Studio calculates sentiment analysis for based on user messages to the agent for a sample set of conversational sessions.

Sentiment analytics uses that information to evaluate if the user satisfaction during the session is positive, negative, or neutral. For example, a user can use words and a tone of voice that indicate frustration or dissatisfaction based on the interaction with the agent. In this case, the session is classified as negative sentiment.

What are the limitations of sentiment analysis, and how can users mitigate for these limitations?

Sentiment analytics aren't calculated using all conversational sessions. Instead, analytics measures a sample of user-agent sessions. Agents below a minimum number of daily successful generative answers can't receive a sentiment score.

Sentiment analysis currently has a dependency on generative answers and requires a minimum number of daily successful answers to calculate sentiment score for the agent.

To calculate sentiment for a session, there must be at least two user messages. Additionally, due to current technical constraints, sentiment analysis isn't performed on sessions that exceed a total of 26 messages (including both user and agent messages)

Sentiment analysis doesn't provide a breakdown of the specific user messages that led to the sentiment score.

What protections are in place for sentiment analysis within Copilot Studio for responsible AI?

Users of agents don't see analytics results; they're available to agent makers and admins only.

You can only use sentiment analysis to see the breakdown of sentiment across all sessions.

We tested sentiment analysis thoroughly during development to ensure good performance. However, on rare occurrences, sentiment assessments might be inaccurate.

Themes of user questions

What is the intended use of Themes?

This feature automatically analyzes large sets of user queries and groups them into high-level topics called themes. Each theme represents a single high-level subject users asked about. Themes provide an unsupervised, data-driven view of user content. This view helps teams understand what users care about most without the manual step of reviewing thousands of queries.

What data is used to create clusters?

The Themes feature uses user queries that trigger generative answers. Themes analyzes all queries from the past seven days to generate new suggested themes.

Themes uses semantic similarity to group queries. A language model is then used to generate the title and description for each cluster. Feedback from makers (such as thumbs up/down) is also collected to improve clustering quality.

What are the limitations of clustering for Themes, and how can users mitigate these limitations?

Successful clustering into themes depends on query volume. If there are not enough queries or if the queries are too unrelated to one another, Copilot Studio might cluster queries into themes that are overly broad or overly narrow.

Themes can occasionally split similar topics or merge unrelated ones.

Shifting language in queries might affect consistency of clusters over time.

You can review themes regularly and provide feedback to improve naming quality.

What protections for Themes are in place within Copilot Studio in terms of responsible AI?

Themes are only visible to makers and admins. Content moderation is applied when generating names and descriptions to reduce the risk of harmful or inappropriate outputs.

Custom metrics analytics

What is the intended use of custom metrics?

Makers use custom metrics analytics to understand how much their conversational agents affect business outcomes. These metrics complement savings analytics. Examples of custom metrics include resolution rate, customer intent classification, and other domain‑specific outcomes.

Custom metrics can show where agents miss intended goals. Makers can define what to measure, test metrics against real session data, and refine definitions based on the results.

What data is used to calculate custom metrics?

Custom metrics are calculated using a sample of past agent sessions. The calculation uses the conversational messages exchanged during a session.

The AI model classifies session data based on the your metric definition. The agent aggregates results across the sample to show overall metric performance for the selected time period.

What are the limitations of custom metrics and how can users minimize the impact of limitations?

Custom metrics aren't calculated using all agent sessions. Instead, they measure a sample of sessions from the selected time period. Because results are based on a sample, they should be treated as directional indicators rather than exact figures.

You should consider that the metric calculation is based on the messages transcript when interpreting metrics. Avoid drawing conclusions about behaviors that occur primarily outside messages, such as topics and tools.

The AI model might misclassify sessions. Aggregate results are generally accurate. Sessions that don't match a defined category are placed in the fallback (Other) category. If test results don't match expected outcomes, you can update the metric description and category definitions.

If an agent's instructions or configuration are significantly changed after a metric is defined, the metric might no longer accurately reflect the agent's updated behavior. You should review their custom metrics after making substantive changes to the agent.

What protections are in place for custom metrics within Copilot Studio for responsible AI?

Custom metrics results are available to agent makers and admins only. Users of the agent don't have access to analytics results.

Review and approve all custom metrics before saving. During metric definition, test metrics against sample session data and review individual results and model reasoning. If results don't meet expectations, you can update or discard the metric. Metrics are not applied without your explicit confirmation.

The AI-generated prompt used to classify sessions is visible to you in the UI, so you can understand how the model interprets your metric definition. You can edit or remove custom metrics at any time.

On rare occasions, individual session classifications might be inaccurate. Results should be interpreted in aggregate rather than at the individual session level.