Exercise - Evaluate and compare AI agent versions

30 minutes

In this exercise, you evaluate two prompt versions of the Trail Guide Agent and create a Version Comparison Decision Document that justifies which version to promote to production based on quality scores and cost analysis.

Throughout this exercise, you:

Design evaluation experiments using Git branches to manage and compare prompt versions
Manually evaluate AI agent responses against structured quality criteria (Intent Resolution, Relevance, Groundedness)
Compare agent versions across test scenarios to identify quality differences and cost trade-offs

Note

To complete this lab, you need an Azure subscription in which you have administrative access.

Launch the exercise and follow the instructions.

Feedback

Was this page helpful?