Exercise - Evaluate and compare AI agent versions

Completed

In this exercise, you evaluate two prompt versions of the Trail Guide Agent and create a Version Comparison Decision Document that justifies which version to promote to production based on quality scores and cost analysis.

Throughout this exercise, you:

  • Design evaluation experiments using Git branches to manage and compare prompt versions
  • Manually evaluate AI agent responses against structured quality criteria (Intent Resolution, Relevance, Groundedness)
  • Compare agent versions across test scenarios to identify quality differences and cost trade-offs

Note

To complete this lab, you need an Azure subscription in which you have administrative access.

Launch the exercise and follow the instructions.

Button to launch exercise.