Share via


Mosaic AI Agent Evaluation tutorial notebook

The following notebook demonstrates how to evaluate a gen AI app using Agent Evaluation's proprietary LLM judges, custom metrics, and labels from domain experts. It demonstrates the following:

  • How to load production logs (traces) into an evaluation dataset.
  • How to run an evaluation and do root cause analysis.
  • How to create custom metrics to automatically detect quality issues.
  • How to send production logs for SMEs to label and evolve the evaluation dataset.

To get your agent ready for pre-production, see the Mosaic AI agent demo notebook. For general information, see What is Mosaic AI Agent Evaluation?.

Agent Evaluation custom metrics, guidelines, and domain expert labels notebook

Get notebook