Episode

Automated evaluation of LLM apps with the azure-ai-generative SDK | Python Data Science Day

with Pamela Fox

Python Data Science Day

Automated evaluation of LLM Apps with Azure ai-generative SDK.

Chapters

00:00 - Automated evaluation of LLM apps with the azure-ai-generative SDK
00:55 - Types of LLM apps
01:09 - Prompt-only LLM app
03:14 - Retrieval Augmented Generation (RAG) LLM app
07:05 - RAG flow
08:10 - Are the answers high quality?
11:21 - LLM Ops for LLM Apps
12:46 - Experimenting with quality factors
14:55 - AI RAG Chat Evaluator https://aka.ms/rag/eval
16:07 - Ground truth data
18:32 - Evaluation
25:29 - Evaluation approach
25:57 - Improving ground truth data sets
26:17 - Next steps

Recommended resources

Python Data Science Day

Connect

Pamela Fox | Twitter/X: @pamelafox

Automated evaluation of LLM Apps with Azure ai-generative SDK.

Chapters

00:00 - Automated evaluation of LLM apps with the azure-ai-generative SDK
00:55 - Types of LLM apps
01:09 - Prompt-only LLM app
03:14 - Retrieval Augmented Generation (RAG) LLM app
07:05 - RAG flow
08:10 - Are the answers high quality?
11:21 - LLM Ops for LLM Apps
12:46 - Experimenting with quality factors
14:55 - AI RAG Chat Evaluator https://aka.ms/rag/eval
16:07 - Ground truth data
18:32 - Evaluation
25:29 - Evaluation approach
25:57 - Improving ground truth data sets
26:17 - Next steps

Recommended resources

Python Data Science Day

Connect

Pamela Fox | Twitter/X: @pamelafox

Developer

Python

Have feedback? Submit an issue here.