Exercise - Track evaluation results in Azure AI Foundry
The Azure AI Evaluation SDK not only enables you to run evaluations programmatically, but you can also track the evaluation results in your Azure AI Project. Once you visualize your evaluation results, you can dive into a thorough examination. Azure AI Foundry includes the ability to not only view individual results but also to compare these results across multiple evaluation runs. By doing so, you can identify trends, discrepancies, and potential tradeoffs, gaining invaluable insights into the performance of your AI system under various conditions.
Scenario
Contoso Gameworks is developing an AI-powered dialogue generator for video game characters, customizing interactions based on game scenarios. The generator should be evaluated for relevance (fitting dialogue to the game’s plot and character traits), fluency (smooth, immersive conversations), and risk and safety (ensuring no violent, offensive, or unfair language is introduced).
Instructions
In this exercise, you run an evaluation to assess a dataset of character dialogue generated by the generator. You also push the results to your Azure AI project to track the results in Azure AI Foundry. Open the evaluate-track-results.ipynb
file to get started.