Exercise - Evaluate and compare AI agent versions
In this exercise, you evaluate two prompt versions of the Trail Guide Agent and create a Version Comparison Decision Document that justifies which version to promote to production based on quality scores and cost analysis.
Throughout this exercise, you:
- Design evaluation experiments using Git branches to manage and compare prompt versions
- Manually evaluate AI agent responses against structured quality criteria (Intent Resolution, Relevance, Groundedness)
- Compare agent versions across test scenarios to identify quality differences and cost trade-offs
Note
To complete this lab, you need an Azure subscription in which you have administrative access.
Launch the exercise and follow the instructions.
