Nata
Norint pasiekti šį puslapį, reikalingas leidimas. Galite pabandyti prisijungti arba pakeisti katalogus.
Norint pasiekti šį puslapį, reikalingas leidimas. Galite pabandyti pakeisti katalogus.
Quality signals provide the vocabulary for diagnosing what works and what doesn't in your agent's responses. Instead of starting with a generic checklist, derive quality signals from patterns you observe during evaluation. This approach ensures your signals reflect what actually matters for your specific agent.
Why quality signals matter
With quality signals, you can diagnose failures faster ("failed on Personalization" is more actionable than "the answer was wrong"), track improvement by signal over time, and communicate clearly with stakeholders. When someone says "the agent isn't good enough," you can respond with specifics: "Policy accuracy is at 95%, but Personalization dropped to 75% after the last update."
Why not start with a generic quality checklist?
A list like "Accuracy, Completeness, Relevance, Tone, Safety" sounds reasonable, but it's too abstract to be actionable. What does "accuracy" mean for a legal research agent versus a creative writing assistant? The quality signals that matter—and how you measure them—depend entirely on what your agent does and whom it serves.
Instead of choosing quality signals upfront, let your evaluation results tell you what matters. When you run test cases against your agent (Stage 2 of the evaluation framework), patterns emerge from the successes and failures. Those patterns become your quality signals.
How quality signals emerge
As you iterate through baseline testing, you notice recurring themes in your results. Some test cases fail because the agent gives outdated information. Others fail because the agent ignores the user's context. Still others succeed specifically because the agent cites its sources or provides clear next steps. Each of these patterns points to a quality signal worth naming and tracking.
Employee Self-Service Agent: From patterns to signals
Here's how the Employee Self-Service Agent team derived quality signals from baseline results:
| Observation | Quality signal |
|---|---|
| ESS-001, ESS-002 passed: Correct policy info | Policy accuracy: Is the information correct? |
| ESS-001 passed: Cited the handbook | Source attribution: Does it cite the source? |
| ESS-003, ESS-004 failed: Ignored user context | Personalization: Does it use employee's context? |
| ESS-005, ESS-006 passed; ESS-009 initially failed | Escalation appropriateness: Does it know when to route? |
| ESS-007 passed; ESS-008 failed | Privacy protection: Does it protect sensitive data? |
| ESS-001 passed: Told user how to check balance | Action enablement: Does it give next steps? |
Quality signals with concrete examples
Once you name your quality signals, make them concrete by defining what passing and failing looks like for each signal.
| Quality signal | Pass looks like | Fail looks like |
|---|---|---|
| Policy accuracy | "15 days PTO" (correct) | "10 days PTO" (outdated) |
| Source attribution | "Per the Employee Handbook..." | No source mentioned |
| Personalization | UK holidays for UK employee | US holidays for UK employee |
| Escalation appropriateness | Routes Family and Medical Leave Act (FMLA) to HR | Tries to explain FMLA rules |
| Privacy protection | "I can't share salary info" | Shares salary or hesitates |
| Action enablement | "Check balance in Workday" | Answers but no next step |
These signals are specific to the Employee Self-Service Agent. A coding assistant would have entirely different signals, such as code correctness, security best practices, and explanation clarity. A customer support agent might track resolution rate and sentiment. Your signals should reflect your agent's unique purpose.
Next step
Learn how to build a repeatable, data-driven evaluation loop that improves your agent with every iteration.