Derive quality signals

Quality signals provide the vocabulary for diagnosing what works and what doesn't in your agent's responses. Instead of starting with a generic checklist, derive quality signals from patterns you observe during evaluation. This approach ensures your signals reflect what actually matters for your specific agent.

Why quality signals matter

With quality signals, you can diagnose failures faster ("failed on Personalization" is more actionable than "the answer was wrong"), track improvement by signal over time, and communicate clearly with stakeholders. When someone says "the agent isn't good enough," you can respond with specifics: "Policy accuracy is at 95%, but Personalization dropped to 75% after the last update."

Why not start with a generic quality checklist?

A list like "Accuracy, Completeness, Relevance, Tone, Safety" sounds reasonable, but it's too abstract to be actionable. What does "accuracy" mean for a legal research agent versus a creative writing assistant? The quality signals that matter—and how you measure them—depend entirely on what your agent does and whom it serves.

Instead of choosing quality signals upfront, let your evaluation results tell you what matters. When you run test cases against your agent (Stage 2 of the evaluation framework), patterns emerge from the successes and failures. Those patterns become your quality signals.

How quality signals emerge

As you iterate through baseline testing, you notice recurring themes in your results. Some test cases fail because the agent gives outdated information. Others fail because the agent ignores the user's context. Still others succeed specifically because the agent cites its sources or provides clear next steps. Each of these patterns points to a quality signal worth naming and tracking.

Employee Self-Service Agent: From patterns to signals

Here's how the Employee Self-Service Agent team derived quality signals from baseline results:

Observation	Quality signal
ESS-001, ESS-002 passed: Correct policy info	Policy accuracy: Is the information correct?
ESS-001 passed: Cited the handbook	Source attribution: Does it cite the source?
ESS-003, ESS-004 failed: Ignored user context	Personalization: Does it use employee's context?
ESS-005, ESS-006 passed; ESS-009 initially failed	Escalation appropriateness: Does it know when to route?
ESS-007 passed; ESS-008 failed	Privacy protection: Does it protect sensitive data?
ESS-001 passed: Told user how to check balance	Action enablement: Does it give next steps?

Quality signals with concrete examples

Once you name your quality signals, make them concrete by defining what passing and failing looks like for each signal.

Quality signal	Pass looks like	Fail looks like
Policy accuracy	"15 days PTO" (correct)	"10 days PTO" (outdated)
Source attribution	"Per the Employee Handbook..."	No source mentioned
Personalization	UK holidays for UK employee	US holidays for UK employee
Escalation appropriateness	Routes Family and Medical Leave Act (FMLA) to HR	Tries to explain FMLA rules
Privacy protection	"I can't share salary info"	Shares salary or hesitates
Action enablement	"Check balance in Workday"	Answers but no next step

These signals are specific to the Employee Self-Service Agent. A coding assistant would have entirely different signals, such as code correctness, security best practices, and explanation clarity. A customer support agent might track resolution rate and sentiment. Your signals should reflect your agent's unique purpose.

Next step

Learn how to build a repeatable, data-driven evaluation loop that improves your agent with every iteration.

Build an iterative evaluation framework

Atsiliepimus

Ar šis puslapis buvo naudingas?

Last updated on 2026-02-10