Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Important
This page describes usage of Agent Evaluation version 0.22
with MLflow 2. Databricks recommends using MLflow 3, which is integrated with Agent Evaluation >1.0
. In MLflow 3, Agent Evaluation APIs are now part of the mlflow
package.
For information on this topic, see Evaluate & Monitor.
The following notebook demonstrates how to evaluate a gen AI app using Agent Evaluation's proprietary LLM judges, custom metrics, and labels from domain experts. It demonstrates the following:
- How to load production logs (traces) into an evaluation dataset.
- How to run an evaluation and do root cause analysis.
- How to create custom metrics to automatically detect quality issues.
- How to send production logs for SMEs to label and evolve the evaluation dataset.
To get your agent ready for pre-production, see the Mosaic AI agent demo notebook. For general information, see Mosaic AI Agent Evaluation (MLflow 2).