Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Important
Some of the functionality described in this release plan has not been released. Delivery timelines may change and projected functionality may not be released (see Microsoft policy). Learn more: What's new and planned
Enabled for | Public preview | General availability |
---|---|---|
Admins, makers, marketers, or analysts, automatically | ![]() |
Sep 2025 |
Business value
The prompt accuracy scoring feature in AI Builder’s prompt builder gives you empirical evidence on prompt effectiveness. It does this by allowing a high degree of testability and, more importantly, evaluation of the prompt outcomes. This allows you to identify areas of improvement to optimize the precision of your prompt to improve AI-driven outcomes in alignment with business goals.
Feature details
The prompt accuracy scoring feature in AI Builder’s prompt builder allows you to build a test suite and validate your prompt performance across different iterations of prompt development. These detailed assessments empower you to make the correct decisions about using prompts in agents, apps, and flows, moving capabilities to production, and prompt improvements. This comprehensive feedback on the effectiveness of your AI prompts helps you optimize for clarity and precision. As you create or refine prompts, the feature analyzes the prompt structure, language, and relevance to the intended task, and assigns a confidence score to each test case prediction that reflects in expected performance of the prompt. This score is generated based on factors such as specificity, complexity, alignment, and custom assertions. Thus, you can derive actionable insights to improve prompt phrasing or reduce ambiguity. By giving you a clear, quantifiable measure of prompt quality, the accuracy scoring feature streamlines the prompt engineering process, enhances model outcomes, and reduces iteration time. This enables more efficient and reliable AI interactions across use cases.