Evaluate language models with Azure Databricks

Module
8 Units

Intermediate

Data Engineer

Azure Databricks

Learn to compare Large Language Model (LLM) and traditional Machine Learning (ML) evaluations, understand their relationship with AI system evaluation, and explore various LLM evaluation metrics and specific task-related evaluations.

Learning objectives

In this module, you learn how to:

Compare LLM and traditional ML evaluations.
Describe the relationship between LLM evaluation and evaluation of entire AI systems.
Describe generic LLM evaluation metrics like accuracy, perplexity, and toxicity.
Describe LLM-as-a-judge for evaluation.

Prerequisites

Before starting this module, you should be familiar with Azure Databricks. Consider completing Explore Azure Databricks before starting this module.

Introduction min
Compare LLM and traditional ML evaluations min
Evaluate LLMs and AI systems min
Evaluate LLMs with standard metrics min
Describe LLM-as-a-judge for evaluation min
Exercise - Evaluate an Azure OpenAI model min
Module assessment min
Summary min

Note: The author created this module with assistance from AI. Learn more