Episode

FastTrack for Azure Season 3 Ep11: Monitoring Azure OpenAI

with Victor Santana, Chris Ayers, Marc Mercier

In this session, we will explore the monitoring of Azure OpenAI. Starting with an overview of the big picture of the entire solution, we will then zoom in on Azure OpenAI services through the lens of the Well-Architected Framework (WAF). We will discuss key concepts such as Tokens, Rate Limiting, Quotas, and PTU, as well as Metrics & Alerts for Azure overall, with a focus on reliability and SRE. We will also delve into SLA and performance, including response times.

Specifically, for OpenAI, we will cover concepts like Token Usage, Quota, and Response Times. As we focus on monitoring for resiliency, performance, and response times, we will discuss Metrics, Dashboards, and Alarms. Finally, a detailed dive into diagnostic settings and log analytics, including the use of Kusto.

By the end of this session, you will have a comprehensive understanding of how to monitor Azure OpenAI, and be equipped with the knowledge and tools.

Learning objectives

Gain a comprehensive overview of Azure OpenAI monitoring within the Well-Architected Framework.
Understand key operational metrics like Tokens, Rate Limiting, Quotas, and PTU relevant to Azure OpenAI.
Learn about setting up Metrics, Alerts, and SLAs for effective monitoring and reliability.
Master the use of Dashboards and Alarms for monitoring resiliency, performance, and response times in Azure OpenAI.
Delve into advanced diagnostic settings and log analytics with Kusto for in-depth monitoring insights.

Chapters

00:00 - Introduction
01:30 - Learning objectives
02:01 - Agenda
03:19 - OpenAI Terms
09:38 - Tokens
11:09 - OpenAI API Quotas
12:41 - Rate Limiting
13:51 - Azure API Management (APIM)
15:23 - APIM Policies
18:11 - APIM Backends
19:00 - APIM Load Balancer & Circuit Breaker
19:46 - Smart Load Balancing for OpenAI Endpoints and APIM
20:47 - Monitoring Azure OpenAI
30:43 - Demo
58:44 - Langfuse on Azure
01:00:06 - Telemetry in Semantic Kernel SDK
01:02:23 - Model monitoring for generative AI applications
01:03:06 - Monitoring published APIs using APIM
01:03:20 - Importing Azure OpenAI APIs into APIM
01:04:23 - Monitoring AI Search

Recommended resources

Session documentation

Full series: Learn Live: FastTrack for Azure Season 3

Connect

Victor Santana | LinkedIn: /in/victorwelascosantana
Chris Ayers | Twitter: @Chris_L_Ayers | LinkedIn: /in/chris-l-ayers
Marc Mercier | LinkedIn: /in/marc-mercier

By the end of this session, you will have a comprehensive understanding of how to monitor Azure OpenAI, and be equipped with the knowledge and tools.

Learning objectives

Gain a comprehensive overview of Azure OpenAI monitoring within the Well-Architected Framework.
Understand key operational metrics like Tokens, Rate Limiting, Quotas, and PTU relevant to Azure OpenAI.
Learn about setting up Metrics, Alerts, and SLAs for effective monitoring and reliability.
Master the use of Dashboards and Alarms for monitoring resiliency, performance, and response times in Azure OpenAI.
Delve into advanced diagnostic settings and log analytics with Kusto for in-depth monitoring insights.

Chapters

00:00 - Introduction
01:30 - Learning objectives
02:01 - Agenda
03:19 - OpenAI Terms
09:38 - Tokens
11:09 - OpenAI API Quotas
12:41 - Rate Limiting
13:51 - Azure API Management (APIM)
15:23 - APIM Policies
18:11 - APIM Backends
19:00 - APIM Load Balancer & Circuit Breaker
19:46 - Smart Load Balancing for OpenAI Endpoints and APIM
20:47 - Monitoring Azure OpenAI
30:43 - Demo
58:44 - Langfuse on Azure
01:00:06 - Telemetry in Semantic Kernel SDK
01:02:23 - Model monitoring for generative AI applications
01:03:06 - Monitoring published APIs using APIM
01:03:20 - Importing Azure OpenAI APIs into APIM
01:04:23 - Monitoring AI Search

Recommended resources

Session documentation

Full series: Learn Live: FastTrack for Azure Season 3

Connect

Victor Santana | LinkedIn: /in/victorwelascosantana
Chris Ayers | Twitter: @Chris_L_Ayers | LinkedIn: /in/chris-l-ayers
Marc Mercier | LinkedIn: /in/marc-mercier

Intermediate

AI Engineer

Developer

Support Engineer

Azure Monitor

FastTrack for Azure Season 3 Ep11: Monitoring Azure OpenAI

Learning objectives

Chapters

Recommended resources

Related episodes

Connect

Learning objectives

Chapters

Recommended resources

Related episodes

Connect