Azure OpenAI Service Multitenant Load Balancing and TPM Handling
Azure OpenAI Service provides various isolation and tenancy models for different scenarios. Some models involve using a dedicated Azure OpenAI Service resource per tenant, while others rely on a multitenant application sharing one or more Azure OpenAI Service resources across multiple tenants. Sharing an Azure OpenAI Service instance across multiple tenants can potentially lead to a Noisy Neighbor problem, resulting in higher latency for certain tenants. To mitigate this, it is crucial to ensure that your application code is multitenancy-aware and incorporates appropriate measures. For instance, if you aim to charge customers based on their usage of a shared Azure OpenAI instance, your application should include logic to monitor and track the total number of tokens consumed by each tenant. This article is primarily focused on showcasing the capabilities of a multitenant service, specifically in terms of evenly distributing and load balancing requests across multiple Azure OpenAI Service instances, all while effectively managing and tracking tokens per minute (TPM) for multiple tenants. The article and companion sample provide implementation details for achieving the following:
- Distributing calls across multiple Azure OpenAI Service instances using a round-robin scheduling algorithm or mapping requests to specific instances based on tenant names.
- Instrumenting the application with custom Prometheus metrics, leveraging the prometheus-net .NET library, to measure per-tenant request tokens, completion tokens, and total tokens.
- Creating a C# ASP.NET service using .NET Standard that accepts REST and gRPC protocol calls.
- Making asynchronous calls to the Azure OpenAI Service REST API using the OpenAIClient.GetChatCompletionsAsync and OpenAIClient.GetChatCompletionsStreamingAsync methods from the Azure OpenAI client library for .NET.
- Utilizing the SharpToken C# library to calculate the number of promot, completion, and total tokens. SharpToken is a C# port of the Python tiktoken library.
- Building and testing the application locally using Docker Compose to containerize the service alongside local instances of Prometheus and Grafana.
- Testing the REST and gRPC service interfaces with tools like Postman or a console application.
- Simulating requests from multiple tenants using a bash script.
- Deploying the service to Azure Kubernetes Service (AKS) using YAML manifests.
- Configuring the NGINX ingress controller to support both REST and gRCP protocols.
- Configuring the AKS workload to access Azure OpenAI Service resources via Microsoft Entra Workload ID.
- Deploying Prometheus and Grafana to your AKS cluster using the kube-prometheus-stack Helm chart or use Azure Monitor managed service for Prometheus and Azure Managed Grafana to collect and visualize Prometheus metrics.
- Creating and deploying a Grafana dashboard to visualize TPM Prometheus metrics.
- Customizing scraping of Prometheus metrics in Azure Monitor managed service for Prometheus.
For more information related to the covered topics, refer to the following articles:
- Multitenancy and Azure OpenAI Service
- Manage Azure OpenAI Service quota
- Working with the GPT-35-Turbo and GPT-4 models
- Azure OpenAI Service quotas and limits
- Microsoft Entra Workload ID with Azure Kubernetes Service (AKS)
- Customize scraping of Prometheus metrics in Azure Monitor managed service for Prometheus
- NGINX Ingress Controller and gRPC protocol support
- Testing gRPC services with Postman or gRPCurl in ASP.NET Core
- Postman gRPC Support
- SharpToken C# Library
Prerequisites
- An active Azure subscription. If you don't have one, create a free Azure account before you begin.
- Visual Studio or Visual Studio Code
- Docker Desktop
- Azure CLI version 2.50.0 or later installed. to install or upgrade, see Install Azure CLI.
- Bicep tools or Terraform by HashiCorp
- aks-preview Azure CLI extension of version
0.5.171
or later installed.
Architecture
The following diagram shows the architecture of the solution on Azure.
The deployment includes the following Azure resources:
- Virtual Network: A virtual network is created with seven subnets:
SystemSubnet
: Used for the agent nodes of thesystem
node pool.UserSubnet
: Used for the agent nodes of theuser
node pool.PodSubnet
: Used for dynamically allocating private IP addresses to pods.ApiServerSubnet
: Delegated subnet for API server VNET integration.AzureBastionSubnet
: Subnet for Azure Bastion Host.VmSubnet
: Subnet for the jump-box virtual machine and private endpoints.
- Azure OpenAI Service: two or more Azure OpenAI Service resources.
- Managed Kubernetes Cluster: An Azure Kubernetes Service (AKS) cluster is created with the following node pools:
system
node pool: Dedicated subnet for critical system pods and services.user
node pool: Dedicated subnet for user workloads and artifacts.
- Jump-Box Virtual Machine: A jump-box virtual machine can be created to manage the private AKS cluster. This is an optional component.
- Azure Bastion: An Azure Bastion is deployed in the AKS cluster's virtual network to provide SSH connectivity to agent nodes and virtual machines.
- Azure NAT Gateway: A bring-your-own (BYO) Azure NAT Gateway can be used to manage outbound connections initiated by AKS-hosted workloads. It is associated with the
SystemSubnet
,UserSubnet
, andPodSubnet
subnets. - Storage Account: A storage account is used to store boot diagnostics logs for the service provider and service consumer virtual machines.
- Azure Container Registry: An Azure Container Registry (ACR) is created to store and manage container images for container deployments.
- Azure Key Vault: An Azure Key Vault is used to store secrets, certificates, and keys. It can be mounted as files by pods using the Azure Key Vault Provider for Secrets Store CSI Driver.
- Private Endpoints and Private DNS Zones: Azure Private Endpoints and Azure Private DNS Zones are created for Azure Container Registry, Azure Key Vault, Azure Storage Account, and API Server (for private AKS clusters).
- Network Security Groups: Azure Network Security Groups are used to filter inbound and outbound traffic for subnets hosting virtual machines and Azure Bastion Hosts.
- Azure Monitor Workspace: An Azure Monitor workspace is created to collect diagnostics logs and metrics from various Azure resources for monitoring and analytics purposes.
- Azure Managed Grafana: An Azure Managed Grafana instance is deployed to visualize Prometheus metrics generated by the AKS cluster.
- Azure Log Analytics Workspace: An Azure Log Analytics workspace is used to collect diagnostics logs and metrics from multiple Azure resources, including the AKS cluster, Application Gateway for Containers, Azure Key Vault, Azure Network Security Group, Azure Container Registry, Azure Storage Account, and the jump-box virtual machine.
- Deployment Scripts: A deployment script is used to run a Bash script to install packages to the AKS cluster via Helm.
To deploy the infrastructure required for hosting your application on Azure Kubernetes Service (AKS) along with one or more Azure OpenAI Service instances, you can refer to the following resources:
- Deploy and run an Azure OpenAI ChatGPT application on AKS via Terraform
- Deploy and run an Azure OpenAI ChatGPT application on AKS via Bicep
These resources provide detailed instructions on deploying and running your application, leveraging the power of AKS and Azure OpenAI Service. The following diagram provides a detailed view of the application's structure:
Here are some key observations regarding the sample application:
- The REST/gRPC service is deployed on Azure Kubernetes Service (AKS) and exposed through the NGINX ingress controller.
- To distribute calls efficiently, the service leverages a round-robin scheduling algorithm or maps requests to specific Azure OpenAI Service instances based on tenant names.
- the AKS Workload utilizes Microsoft Entra Workload ID to securely access the Azure OpenAI Service instances.
- Secure and private access to the Azure OpenAI Service instances is enabled by the use of Azure Private Endpoints and Azure Private DNS Zones.
Azure OpenAI Quotas and Limits
Azure OpenAI Service provides a powerful feature called quotas that allows you to assign rate limits to your deployments.
Tokens per Minute (TPM)
These quotas are assigned to your subscription on a per-region, per-model basis, and are measured in Tokens-per-Minute (TPM). Quota acts as a global limit, and it determines the maximum number of inference tokens you can consume in a minute. The billing component of TPMs follows a pay-as-you-go model, where you are charged based on the consumption of each model. When you onboard a subscription to Azure OpenAI, you are initially provided with default quotas for the available models. As you create deployments, you can assign Tokens Per Minute (TPM) to each deployment, which reduces the available quota for that specific model. You can continue creating deployments and assigning TPM until you reach your quota limit. Once the limit is reached, you have a couple of options to create new deployments of the same model. Firstly, you can free up TPM by reducing the TPM assigned to other deployments of the same model. This will create available TPM that can be utilized for new deployments. Alternatively, you can request and receive approval for a model quota increase in the desired region. This will allow you to create new deployments without reducing TPM from existing deployments.
For instance, let's consider a scenario where a customer has a quota of 240,000 TPM for the gpt-4
model in the East US region. With this quota, the customer can create a single deployment of 240,000 TPM, two deployments of 120,000 TPM each, or any combination of deployments as long as the total TPM is less than or equal to 240,000 for that model in that region.
Tokens per Minute (TPM) also serve as the default mechanism for billing the Azure OpenAI Service. For more information, see Azure OpenAI Service pricing.
The flexibility to distribute TPM globally within a subscription and region has led to the loosening of certain restrictions in Azure OpenAI Service:
- The maximum resources per region have been increased to 30.
- The limit on creating only one deployment of the same model in a resource has been removed, allowing multiple deployments of the same model.
It is essential to manage Tokens Per Minute (TPM) and Requests-per-Minute (RPM) rate limits, especially in a multitenant application that serves calls from multiple customers. The quota system allows you to manage these limits effectively. TPM determines the rate at which tokens are consumed, while RPM defines the rate at which requests are made. For more information on Azure OpenAI quotas and limits, see the Manage Azure OpenAI Service quota.
Requests Per Minute (RPM)
In addition to Tokens per Minute (TPM), there is a rate limit called Requests-Per-Minute (RPM) that is enforced. The value of RPM is set proportionally to the TPM assignment using the ratio 6 RPM per 1000 TPM. Unlike TPMs, RPMs are not directly related to billing. However, they are a component of rate limits. It's important to understand that while TPMs determine the billing, the rate limits are triggered at a per-second basis, rather than per-minute. The rate limits can be evaluated either as Tokens per Second (TPS) or RPM over a short period of time (1-10 seconds). So, if you exceed the total number of tokens per second for a specific model, a rate limit will apply. Similarly, if you exceed the RPM over a short time period, a rate limit will also be enforced, resulting in limit errors corresponding to the 429 HTTP error code.
Provisioned Throughput Units (PTUs)
Microsoft has recently introduced Provisioned Throughput Units (PTUs) as a new feature for Azure OpenAI Service. PTUs enable the use of reserved capacity for model processing, specifically for processing prompts and generating completions. Unlike TPMs, which are based on a pay-as-you-go model, PTUs are purchased as a monthly commitment with an optional auto-renewal. This option reserves Azure OpenAI capacity within an Azure subscription for a specific model and Azure region. For example, if you have 300 PTUs provisioned for GPT 3.5 Turbo, those PTUs can only be used for GPT 3.5 Turbo deployments within a specific Azure subscription. PTUs can be acquired separately for different models, with a minimum requirement specified in the provided table. It's important to note that while PTUs provide consistent latency and throughput, the actual throughput will depend on various factors such as the number and size of prompts and generation tokens, the number of simultaneous requests, and the specific model and its version.
Managing Azure OpenAI in a Software-as-a-Service (SaaS) Multitenant Application
In a multitenant application, it is crucial to manage one or more instances of Azure OpenAI Service appropriately. Different isolation and tenancy models can be adopted depending on your specific requirements. Here are a few common models:
- Azure OpenAI Service for Each Tenant in the Provider's Subscription: In this model, each tenant in the SaaS application has their dedicated instance of Azure OpenAI within the provider's subscription.
- Azure OpenAI Service for Each Tenant in the Tenant's Subscription: In this model, each tenant in the SaaS application has their dedicated instance of Azure OpenAI within their own subscription.
- Shared Azure OpenAI Service: Multiple tenants in the SaaS application share one or more instances of Azure OpenAI. This model provides cost efficiency but may result in performance challenges.
- Shared Azure OpenAI Service with the Same Model for Each Tenant: Each tenant has their own deployment within a shared Azure OpenAI Service instance. This model offers a balance between resource sharing and tenant-specific customization.
For more information, see Multitenancy and Azure OpenAI Service.
Challenges of Shared Azure OpenAI and Noisy Neighbor Problem
Using a shared Azure OpenAI Service instance among multiple tenants can lead to a Noisy Neighbor problem. This problem arises when the consumption of resources by one tenant affects the performance of other tenants. For example, if one tenant heavily utilizes the shared instance, it may result in higher latency for other tenants. To mitigate the Noisy Neighbor problem, it is essential to make your application code multitenancy-aware. Implement logic in your application to keep track of the total number of tokens consumed by each tenant when using a shared Azure OpenAI instance. This ensures fair resource allocation and allows you to charge customers based on their individual consumption. For more information on managing Azure OpenAI in a multitenant application and addressing the Noisy Neighbor problem, see the Multitenancy and Azure OpenAI Service.
Load Balancing Requests Across Multiple Azure OpenAI Service Instances
In a multi-tenant scenario, there are various ways to load balance requests across multiple Azure OpenAI Service instances. You can leverage a global load balancer such as Azure Front Door or a service proxy like Azure API Management to authenticate, distribute, throttle, and monitor calls to a pool of Azure OpenAI Service instances. For more information, see the following articles:
- Build an enterprise-ready Azure OpenAI solution with Azure API Management
- Load Balancing Azure OpenAI with Azure Front Door
- Utilize API Management to make Azure OpenAI load-balanced and redundant
Alternatively, you can implement custom logic in your multi-tenant application to load balance and monitor requests to multiple Azure OpenAI Service instances. This approach gives you more control and reduces the overall cost of ownership, as external services like Azure Front Door or Azure API Management are not required. This article focuses on this approach.
Prometheus Metric Types
Prometheus offers four core metric types that can be used to monitor and collect data:
- Counter: A counter is a cumulative metric that represents a monotonically increasing value, which can only grow or be reset to zero upon restart. Counters are best suited for tracking metrics such as requests served, tasks completed, or errors.
- Gauge: A gauge is a metric that represents a single numerical value that can fluctuate up and down. Gauges are commonly used to monitor values like temperatures, current memory usage, or the number of concurrent requests.
- Histogram: A histogram is used to sample and count observations, typically for metrics such as request durations or response sizes. It organizes the observations into configurable buckets and provides a sum of all observed values. Histograms are useful for calculating quantiles or Apdex scores.
- Summary: A summary is similar to a histogram but provides a different way of calculating quantiles and aggregations. Summaries are suitable for measuring percentiles and quantiles from observed values.
Prometheus client libraries support these metric types, providing APIs tailored to each type. However, it's worth noting that, currently, the Prometheus server does not differentiate between the types and treats all data as untyped time series. This may change in the future.
For detailed usage documentation for each metric type in different programming languages, refer to the following links:
Use Microsoft Entra Workload ID with Azure Kubernetes Service (AKS)
Workloads deployed on an Azure Kubernetes Services (AKS) cluster require Microsoft Entra ID application credentials or managed identities to access Azure AD protected resources, such as Azure Key Vault and Microsoft Graph. Microsoft Entra Workload ID integrates with the capabilities native to Kubernetes to federate with external identity providers.
Microsoft Entra Workload ID uses Service Account Token Volume Projection to enable pods to use a Kubernetes service account. When enabled, the AKS OIDC Issuer issues a service account security token to a workload and OIDC federation enables the application to access Azure resources securely with Azure AD based on annotated service accounts.
Microsoft Entra Workload ID works well with the Azure Identity client libraries and the Microsoft Authentication Library (MSAL) collection if you use a registered application instead of a managed identity. Your workload can use any of these libraries to seamlessly authenticate and access Azure cloud resources.
For more information, see the following resources:
- Azure Workload Identity open-source project
- Use an Microsoft Entra Workload ID on Azure Kubernetes Service (AKS
- Deploy and configure workload identity on an Azure Kubernetes Service (AKS) cluster
- Modernize application authentication with workload identity sidecar
- Tutorial: Use a workload identity with an application on Azure Kubernetes Service (AKS)
- Workload identity federation
- Use Microsoft Entra Workload ID for Kubernetes with a User-Assigned Managed Identity
- Use Microsoft Entra Workload ID for Kubernetes with an Azure AD registered application
- Azure Managed Identities with Workload Identity Federation
- Microsoft Entra Workload ID federation with Kubernetes
Visual Studio Solution
The Visual Studio solution consists of the following projects:
- OpenAiRestApi: This project contains a multitenant ASP.NET API service that exposes a simple interface via REST and gRPC. It handles requests from multiple tenants.
- Client: This project is a console application that can be used to test the
OpenAiRestApi
service using REST or gRPC protocols, both locally and on Azure. - Scripts: The
scripts
folder contains various scripts used to bootstrap the local environment or test the service. - Docker Compose: This folder includes Docker Compose configuration files to run the
OpenAiRestApi
service, Prometheus, and Grafana containers and communicate between them when using Container Tools in Visual Studio. For more information, see Tutorial: Create a multi-container app with Docker Compose.
OpenAiRestApi Service
The OpenAiRestApi service is developed using C# and ASP.NET Core. It utilizes the following NuGet packages:
- Azure OpenAI client library for .NET: This library is used to invoke the Azure OpenAI Service REST API reference.
- Prometheus-net: This .NET library is used to create counter, gauge, and histogram Prometheus metrics. These metrics track request tokens, completion tokens, and total tokens consumed by each tenant request.
- Azure Identity client library for .NET: This library enables authentication against Azure OpenAI Service instances using Microsoft Entra Workload ID.
- Azure Key Vault Secrets configuration provider for Microsoft.Extensions.Configuration: This .NET library allows reading configuration values from secrets stored in Azure Key Vault.
- SharpToken: This .NET library is used to calculate the number of promot, completion, and total tokens. SharpToken is a C# port of the Python tiktoken library.
- Swashbuckle.AspNetCore: this library provides Swagger tools for documenting APIs built on ASP.NET Core.
- Grpc.AspNetCore: Grpc.AspNetCore is a metapackage with references to:
- Grpc.AspNetCore.Server: gRPC server library for .NET.
- Grpc.Tools: Code-generation tooling package.
- Google.Protobuf: Protobuf serialization library used by gRPC.
Program.cs
The following table contains the code of the Program.c
file.
using Microsoft.Extensions.Options;
using OpenAiRestApi.Middleware;
using OpenAiRestApi.Options;
using OpenAiRestApi.Services;
using Azure.AI.OpenAI;
using Prometheus;
using Azure.Identity;
using Microsoft.OpenApi.Models;
using System.Reflection;
var builder = WebApplication.CreateBuilder(args);
// Add services to the container.
builder.Services.AddControllers();
// Add gRPC services to the container.
builder.Services.AddGrpc();
// Learn more about configuring Swagger/OpenAPI at https://aka.ms/aspnetcore/swashbuckle
builder.Services.AddEndpointsApiExplorer();
// Add Swagger generator service that builds SwaggerDocument objects directly from your routes, controllers, and models.
builder.Services.AddSwaggerGen(options =>
{
options.SwaggerDoc("v1", new OpenApiInfo
{
Version = "v1",
Title = "OpenAI REST API",
Description = "An ASP.NET Core Web API for managing calls to a range of Azure OpenAI Services.",
TermsOfService = new Uri("https://www.apache.org/licenses/LICENSE-2.0.txt"),
Contact = new OpenApiContact
{
Name = "Paolo Salvatori",
Email = "paolos@microsoft.com",
Url = new Uri("https://github.com/paolosalvatori")
},
License = new OpenApiLicense
{
Name = "Apache License - Version 2.0, January 2004",
Url = new Uri("https://www.apache.org/licenses/LICENSE-2.0.html")
}
});
// using System.Reflection;
var xmlFilename = $"{Assembly.GetExecutingAssembly().GetName().Name}.xml";
options.IncludeXmlComments(Path.Combine(AppContext.BaseDirectory, xmlFilename));
});
// Configure the PrometheusOptions from appsettings.json
builder.Services.Configure<PrometheusOptions>(builder.Configuration.GetSection("Prometheus"));
// Configure the AzureOpenAiOptions from appsettings.json
builder.Services.Configure<AzureOpenAIOptions>(builder.Configuration.GetSection("AzureOpenAI"));
// Configure the ChatCompletionOptions from appsettings.json
builder.Services.Configure<Dictionary<string, ChatCompletionsOptions>>(builder.Configuration.GetSection("ChatCompletionsOptions"));
// Configure the tenant Azure OpenAI Service mappings from appsettings.json
builder.Services.Configure<Dictionary<string, string>>(builder.Configuration.GetSection("TenantAzureOpenAIMappings"));
// Access the configured options
builder.Services.AddSingleton(provider =>
{
var options = provider.GetRequiredService<IOptions<PrometheusOptions>>();
return options.Value;
});
// Add the AzureOpenAiOptions as a singleton
builder.Services.AddSingleton(provider =>
{
var options = provider.GetRequiredService<IOptions<AzureOpenAIOptions>>();
return options.Value;
});
// Add the ChatCompletionOptions as a singleton
builder.Services.AddSingleton(provider =>
{
var options = provider.GetRequiredService<IOptions<ChatCompletionsOptions>>();
return options.Value;
});
// Add the tenant Azure OpenAI Service mappings as a singleton
builder.Services.AddSingleton(provider =>
{
var options = provider.GetRequiredService<IOptions<Dictionary<string, string>>>();
return options.Value;
});
// add the OpenAIGrpcService as a singleton
builder.Services.AddSingleton<OpenAIGrpcService>();
// add the AzureOpenAIService as a singleton
builder.Services.AddSingleton<AzureOpenAIService>();
var app = builder.Build();
// Expose the OpenAIGrpcService as a service
app.MapGrpcService<OpenAIGrpcService>();
// Configure the HTTP request pipeline.
if (app.Environment.IsProduction())
{
var keyVaultName = builder.Configuration["KeyVaultName"];
if (!string.IsNullOrEmpty(keyVaultName))
{
builder.Configuration.AddAzureKeyVault(
new Uri($"https://{keyVaultName}.vault.azure.net/"),
new DefaultAzureCredential());
}
}
// Log the configuration when in debug mode
if (string.Compare(builder.Configuration["Debug"], "true", true) == 0)
{
foreach (var key in builder.Configuration.AsEnumerable())
{
Console.WriteLine($"{key.Key} = {key.Value}");
}
}
app.UseSwagger();
app.UseSwaggerUI();
app.UseHttpLogging();
app.UseAuthorization();
app.UseTenantMiddleware();
app.MapControllers();
// This call publishes Prometheus metrics on the /metrics URL.
app.MapMetrics();
app.UseRouting();
app.UseHttpMetrics();
// Run the application.
app.Run();
Here is a breakdown of what the code does:
- The necessary dependencies and services are added to the container using
builder.Services.Add
methods. These services include controllers, gRPC services, API explorer, and Swagger generator for documentation. - Swagger generation is configured to generate a Swagger document using the
AddSwaggerGen
method. This document provides description, license, contact information, and terms of service for the API. For more information, see Get started with Swashbuckle and ASP.NET Core. - Configuration options from the
appsettings.json
file are loaded and bound to corresponding classes usingbuilder.Services.Configure<TOptions>
methods. - Singleton instances are added for the options used to read the
Prometheus
,AzureOpenAI
,ChatCompletion
, andTenantAzureOpenAIMappings
sections in theappsettings.json
configuration file. - The
OpenAIGrpcService
andAzureOpenAIService
classes are registered as singleton services. - The application is built using the
builder.Build()
method call. - The gRPC service is exposed using the
app.MapGrpcService<OpenAIGrpcService>()
method call. - Middleware components like Swagger, HTTP logging, authorization, tenant middleware, and routing are configured using the
app.UseXXX
methods. - Prometheus metrics are exposed on the
/metrics
URL usingapp.MapMetrics()
. - The application starts running using
app.Run()
.
This code sets up an ASP.NET Core Web API with gRPC support, Swagger documentation, Prometheus metrics, and various middleware components for logging, authorization, and routing. The following picture shows the Swagger UI exposed by the service:
The following screenshot shows the Prometheus metrics avaiable for scraping at the /metrics
path.
appsettings.json
The following appsettings.json
file defines the configuration for the ASP.NET Core Web API service.
{
"Logging": {
"LogLevel": {
"Default": "Information",
"Microsoft.AspNetCore": "Warning"
}
},
"AllowedHosts": "*",
"KeyVaultName": "TanKeyVault",
"Kestrel": {
"Endpoints": {
"Http": {
"Protocols": "Http1",
"Url": "http://*:80"
},
"Grpc": {
"Protocols": "Http2",
"Url": "http://*:6000"
}
}
}
}
It includes settings related to logging, allowed hosts, and Kestrel endpoints.
- Logging: Specifies the logging configuration, including the default log level and log level for Microsoft.AspNetCore components.
- AllowedHosts: Allows requests from any host by using the wildcard "*" value.
- KeyVault: This paramater is optional. It specifies the name of an Azure Key Vault resource which contains secrets containing values for the
PrometheusOptions
,AzureOpenAIOptions
,ChatCompletionOptions
, andTenantAzureOpenAIMappings
sections. - Kestrel: Configures the Kestrel server, which is used to host the API.
- Endpoints:
- Http: Defines an HTTP endpoint on port 80 for the REST service.
- Grpc: Defines an HTTP/2 endpoint on port 6000 for the gRPC service.
- Endpoints:
The appsettings.Development.json
file contains additional settings specific to the development environment. It includes configuration options for Prometheus metrics, Azure OpenAI Service, and tenant mappings.
{
"ChatCompletionsOptions": {
"Temperature": 0.8,
"MaxTokens": 16000
},BetaOpenAi
"Prometheus": {
"Enabled": true,
"Histograms": {
"PromptTokens": {
"Start": 10,
"Width": 10,
"Count": 10
},
"CompletionTokens": {
"Start": 100,
"Width": 100,
"Count": 10
},
"TotalTokens": {
"Start": 100,
"Width": 100,
"Count": 10
}
}
},
"AzureOpenAI": {
"SystemPrompt": "The assistant is helpful, creative, clever, and very friendly.",
"Services": {
"AlphaOpenAI": {
"Endpoint": "https://alphaopenai.openai.azure.com/",
"ApiKey": "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx",
"Type": "azure",
"Version": "2023-08-01-preview",
"Deployment": "gpt-35-turbo-16k",
"Model": "gpt-35-turbo-16k",
"MaxResponseTokens": 1000,
"MaxRetries": 3
},
"BetaOpenAi": {
"Endpoint": "https://betaopenai.openai.azure.com/",
"ApiKey": "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx",
"Type": "azure",
"Version": "2023-08-01-preview",
"Deployment": "gpt-35-turbo-16k",
"Model": "gpt-35-turbo-16k",
"MaxResponseTokens": 1000,
"MaxRetries": 3
}
}
},
"TenantAzureOpenAIMappings": {
"contoso": "AlphaOpenAI",
"fabrikam": "BetaOpenAi"
}
}
Here is a brief description of each section and parameter:
- ChatCompletionsOptions: Specifies the configuration for chat completions.
- Temperature: Sets the temperature for chat completions.
- MaxTokens: Specifies the maximum number of tokens for chat completions.
- Prometheus: Contains the configuration for Prometheus metrics.
- Enabled: Specifies whether Prometheus metrics are enabled.
- Histogram: Contains an element for each histogram metric.
- Element: Defines configuration values for the specified histogram metric.
- Start": Specifies the upper bound of the lowest bucket.
- Width": Specifies the width of each bucket (distance between lower and upper bound).
- Count": Specifies the number of buckets to create. Must be positive.
- Element: Defines configuration values for the specified histogram metric.
- AzureOpenAI: Contains the configuraton for Azure OpenAI Service instances.
- SystemPrompt: Specifies a default system prompt, if none is specified in the input parameters.
- Services: Contains an element for each Azure OpenAI Service.
- Element: Defines configuration values for the specified Azure OpenAI Service.
- Endpoint: Specifies the endpoint URL.
- ApiKey"**: Specifies the API key. This element is not necessary when the service is configured to use Microsoft Entra Workload ID.
- Type: Specifies the access method:
azure
when using the API key,azuread
when using Microsoft Entra Workload ID. - "Version: Specifies the version of the Azure OpenAI Service REST API.
- Deployment: Specifies the deployment name.
- Model: Specifies the model name.
- MaxResponseTokens: Specifies the maximum number of response tokens.
- MaxRetries: Specifies the maximum number of retries when invoking the Azure OpenAI Service REST API.
- Element: Defines configuration values for the specified Azure OpenAI Service.
- TenantAzureOpenAIMappings: Contains mappings of tenants to specific Azure OpenAI Service instances. These mappings dictate how requests from each tenant are handled by the service.
- When a tenant is mapped to a specific Azure OpenAI Service instance, the service will consistently use that particular resource to handle all requests originating from that tenant.
- In cases where there is no explicit mapping for a tenant, the service employs a round-robin scheduling algorithm. This algorithm randomly selects an available Azure OpenAI Service instance to process the request, ensuring a fair distribution of workload across the instances.
TenantMiddleware
The TenantMiddleware
class contains the definition of an ASP.NET Core middleware component designed to extract the tenant name from various sources (query string, custom HTTP header, JWT token) and add it as a parameter to the request query string.
using System.IdentityModel.Tokens.Jwt;
using Microsoft.Extensions.Primitives;
namespace OpenAiRestApi.Middleware;
public class TenantMiddleware
{
private readonly string _tenantParameterName;
private readonly string _tenantHeaderName;
private readonly string _tenantClaimName;
private readonly string _noTenantErrorMessage;
private readonly RequestDelegate _next;
public TenantMiddleware(RequestDelegate next,
string tenantParameterName = "tenant",
string tenantHeaderName = "X-Tenant",
string tenantClaimName = "tenant",
string noTenantErrorMessage = "No tenant found in the request.")
{
_next = next;
_tenantParameterName = tenantParameterName;
_tenantHeaderName = tenantHeaderName;
_tenantClaimName = tenantClaimName;
_noTenantErrorMessage = noTenantErrorMessage;
}
// IMessageWriter is injected into InvokeAsync
public async Task InvokeAsync(HttpContext httpContext)
{
try
{
if (!httpContext.Request.Path.ToString().StartsWith("/openai"))
{
await _next(httpContext);
return;
}
if (httpContext.Request.Query.ContainsKey(_tenantParameterName) &&
!string.IsNullOrEmpty(httpContext.Request.Query[_tenantParameterName]))
{
await _next(httpContext);
return;
}
string tenant;
if (!string.IsNullOrEmpty(tenant = GetTenantFromHeader(httpContext.Request)))
{
AddTenantParameter(httpContext, tenant);
await _next(httpContext);
return;
}
if (!string.IsNullOrEmpty(tenant = GetTenantFromToken(httpContext.Request)))
{
AddTenantParameter(httpContext, tenant);
await _next(httpContext);
return;
}
httpContext.Response.StatusCode = StatusCodes.Status400BadRequest;
await httpContext.Response.WriteAsync(_noTenantErrorMessage);
}
catch (Exception ex)
{
// Set the status code and write the error message to the response
httpContext.Response.StatusCode = StatusCodes.Status500InternalServerError;
await httpContext.Response.WriteAsync(ex.Message);
}
}
private void AddTenantParameter(HttpContext httpContext, string tenant)
{
httpContext.Request.Query = new QueryCollection(httpContext.Request.Query
.ToDictionary(x => x.Key, x => x.Value)
.Concat(new Dictionary<string, StringValues>() { { _tenantParameterName, new StringValues(tenant) } })
.ToDictionary(pair => pair.Key, pair => pair.Value));
}
private string GetTenantFromHeader(HttpRequest request)
{
// Retrieve the tenant parameter from the tenant header
string tenant = request.Headers[_tenantHeaderName].FirstOrDefault()!;
return tenant;
}
private string GetTenantFromToken(HttpRequest request)
{
// Retrieve the authorization header value
string authorizationHeader = request.Headers.Authorization.FirstOrDefault()!;
if (!string.IsNullOrEmpty(authorizationHeader) && authorizationHeader.StartsWith("Bearer "))
{
string token = authorizationHeader.Substring("Bearer ".Length).Trim();
// Remove the "Bearer " prefix from the token
string jwtToken = token.Replace("Bearer ", string.Empty);
// Create an instance of JwtSecurityTokenHandler
var tokenHandler = new JwtSecurityTokenHandler();
// Read the token and parse it to a JwtSecurityToken object
var parsedToken = tokenHandler.ReadJwtToken(jwtToken);
// Return the value of the tenant claim from the token's payload
return parsedToken?.Claims?.FirstOrDefault(claim => claim.Type == _tenantClaimName)?.Value!;
}
return string.Empty;
}
}
public static class TenantMiddlewareExtensions
{
public static IApplicationBuilder UseTenantMiddleware(
this IApplicationBuilder builder)
{
return builder.UseMiddleware<TenantMiddleware>();
}
}
The InvokeAsync
method is responsible for executing the middleware logic and is called for each HTTP request.
- It checks if the request path does not start with
/openai
. If so, it bypasses the middleware and calls the next middleware component in the pipeline. - It checks if the
tenant
parameter is present in the request query string. If so, it bypasses the middleware and calls the next middleware component in the pipeline. - It attempts to retrieve the tenant name from the
x-tenant
custom HTTP header. If the header value is found, it adds the tenant parameter to the request query string and calls the next middleware component. - If the tenant name is not found in the header, it attempts to retrieve it from the
tenant
claim of the JWT token provided in the request's authorization header. - If the tenant name is found in the token, it adds the tenant parameter to the request query string and calls the next middleware component.
- If the tenant name is not found in either the header or the token, it returns a 400 Bad Request response with an error message.
The TenantMiddlewareExtensions
class defines an extension method called UseTenantMiddleware
for the IApplicationBuilder
interface. The UseTenantMiddleware
method is used to add the TenantMiddleware
to the middleware pipeline. It returns the updated IApplicationBuilder
object.
For more information on ASP.NET middleware and how to use and configure it, refer to the official documentation.
PrometheusMetrics
The PrometheusMetrics
class is a utility class that provides methods for recording and tracking metrics related to the Azure OpenAI Service. It utilizes the Prometheus library to create gauge, counter, and histogram metrics.
using OpenAiRestApi.Options;
using Prometheus;
using System.Drawing;
namespace OpenAiRestApi.Utils
{
public class PrometheusMetrics
{
// Gauge metrics
private readonly Gauge _promptTokenCount;
private readonly Gauge _completionTokenCount;
private readonly Gauge _totalTokenCount;
// Counter metrics
private readonly Counter _promptTokenTotal;
private readonly Counter _completionTokenTotal;
private readonly Counter _totalTokenTotal;
// Histogram metrics
private readonly Histogram _promptTokenHistogram;
private readonly Histogram _completionTokenHistogram;
private readonly Histogram _totalTokenHistogram;
#region Public Constructors
public PrometheusMetrics(PrometheusOptions prometheusOptions)
{
// Gauge metrics
_promptTokenCount = Metrics.CreateGauge(
"openai_prompt_tokens_processed",
"Number of prompt tokens processed by the Azure OpenAI Service.",
labelNames: new[] { "openai_name", "tenant_name", "method_name" });
_completionTokenCount = Metrics.CreateGauge(
"openai_completion_tokens_processed",
"Number of completion tokens processed by the Azure OpenAI Service.",
labelNames: new[] { "openai_name", "tenant_name", "method_name" });
_totalTokenCount = Metrics.CreateGauge(
"openai_total_tokens_processed",
"Number of total tokens processed by the Azure OpenAI Service.",
labelNames: new[] { "openai_name", "tenant_name", "method_name" });
// Counter metrics
_promptTokenTotal = Metrics.CreateCounter(
"openai_prompt_tokens_total",
"Total number of prompt tokens processed by the Azure OpenAI Service.",
labelNames: new[] { "openai_name", "tenant_name", "method_name" });
_completionTokenTotal = Metrics.CreateCounter(
"openai_completion_tokens_total",
"Total number of completion tokens processed by the Azure OpenAI Service.",
labelNames: new[] { "openai_name", "tenant_name", "method_name" });
_totalTokenTotal = Metrics.CreateCounter(
"openai_total_tokens_total",
"Total number of total tokens processed by the Azure OpenAI Service.",
labelNames: new[] { "openai_name", "tenant_name", "method_name" });
// Histogram metrics
_promptTokenHistogram = Metrics.CreateHistogram(
"openai_prompt_tokens",
"The distribution of prompt tokens processed by the Azure OpenAI Service.",
labelNames: new[] { "openai_name", "tenant_name", "method_name" },
new HistogramConfiguration
{
Buckets = Histogram.LinearBuckets(
start: prometheusOptions.Histograms["PromptTokens"].Start,
width: prometheusOptions.Histograms["PromptTokens"].Width,
count: prometheusOptions.Histograms["PromptTokens"].Count)
});
_completionTokenHistogram = Metrics.CreateHistogram(
"openai_completion_tokens",
"The distribution of completion tokens processed by the Azure OpenAI Service.",
labelNames: new[] { "openai_name", "tenant_name", "method_name" },
new HistogramConfiguration
{
Buckets = Histogram.LinearBuckets(
start: prometheusOptions.Histograms["CompletionTokens"].Start,
width: prometheusOptions.Histograms["CompletionTokens"].Width,
count: prometheusOptions.Histograms["CompletionTokens"].Count)
});
_totalTokenHistogram = Metrics.CreateHistogram(
"openai_total_tokens",
"The distribution of total tokens processed by the Azure OpenAI Service.",
labelNames: new[] { "openai_name", "tenant_name", "method_name" },
new HistogramConfiguration
{
Buckets = Histogram.LinearBuckets(
start: prometheusOptions.Histograms["TotalTokens"].Start,
width: prometheusOptions.Histograms["TotalTokens"].Width,
count: prometheusOptions.Histograms["TotalTokens"].Count)
});
}
public void SetPromptTokenCount(string tenant, string openAIName, string methodName, double value) => _promptTokenCount.WithLabels(new[] { openAIName.ToLower(), tenant.ToLower(), methodName.ToLower() }).Set(value);
public void SetCompletionTokenCount(string tenant, string openAIName, string methodName, double value) => _completionTokenCount.WithLabels(new[] { openAIName.ToLower(), tenant.ToLower(), methodName.ToLower() }).Set(value);
public void SetTotalTokenCount(string tenant, string openAIName, string methodName, double value) => _totalTokenCount.WithLabels(new[] { openAIName.ToLower(), tenant.ToLower(), methodName.ToLower() }).Set(value);
public void IncPromptTokenTotal(string tenant, string openAIName, string methodName, double value) => _promptTokenTotal.WithLabels(new[] { openAIName.ToLower(), tenant.ToLower(), methodName.ToLower() }).Inc(value);
public void IncCompletionTokenTotal(string tenant, string openAIName, string methodName, double value) => _completionTokenTotal.WithLabels(new[] { openAIName.ToLower(), tenant.ToLower(), methodName.ToLower() }).Inc(value);
public void IncTotalTokenTotal(string tenant, string openAIName, string methodName, double value) => _totalTokenTotal.WithLabels(new[] { openAIName.ToLower(), tenant.ToLower(), methodName.ToLower() }).Inc(value);
public void ObservePromptTokenHistogram(string tenant, string openAIName, string methodName, double value) => _promptTokenHistogram.WithLabels(new[] { openAIName.ToLower(), tenant.ToLower(), methodName.ToLower() }).Observe(value);
public void ObserveCompletionTokenHistogram(string tenant, string openAIName, string methodName, double value) => _completionTokenHistogram.WithLabels(new[] { openAIName.ToLower(), tenant.ToLower(), methodName.ToLower() }).Observe(value);
}
}
The class defines the following private fields and relative Prometheus metrics:
_promptTokenCount
: A gauge metric that tracks the number of prompt tokens processed by the Azure OpenAI Service._completionTokenCount
: A gauge metric that tracks the number of completion tokens processed by the Azure OpenAI Service._totalTokenCount
: A gauge metric that tracks the number of total tokens processed by the Azure OpenAI Service._promptTokenTotal
: A counter metric that keeps a running count of the total number of prompt tokens processed by the Azure OpenAI Service._completionTokenTotal
: A counter metric that keeps a running count of the total number of completion tokens processed by the Azure OpenAI Service._totalTokenTotal
: A counter metric that keeps a running count of the total number of tokens processed by the Azure OpenAI Service._promptTokenHistogram
: A histogram metric that records the distribution of prompt tokens processed by the Azure OpenAI Service._completionTokenHistogram
: A histogram metric that records the distribution of completion tokens processed by the Azure OpenAI Service.
The class provides the following public methods:
SetPromptTokenCount
: Sets the value of thepromptTokenCount
gauge metric. It takes the tenant name, Azure OpenAI Service name, method name, and the value to be set as parameters.SetCompletionTokenCount
: Sets the value of thecompletionTokenCount
gauge metric. It takes the tenant name, Azure OpenAI Service name, method name, and the value to be set as parameters.SetTotalTokenCount
: Sets the value of thetotalTokenCount
gauge metric. It takes the tenant name, Azure OpenAI Service name, method name, and the value to be set as parameters.IncPromptTokenTotal
: Increments the value of thepromptTokenTotal
counter metric. It takes the tenant name, Azure OpenAI Service name, method name, and the value to be incremented as parameters.IncCompletionTokenTotal
: Increments the value of thecompletionTokenTotal
counter metric. It takes the tenant name, Azure OpenAI Service name, method name, and the value to be incremented as parameters.IncTotalTokenTotal
: Increments the value of thetotalTokenTotal
counter metric. It takes the tenant name, Azure OpenAI Service name, method name, and the value to be incremented as parameters.ObservePromptTokenHistogram
: Observes a value in thepromptTokenHistogram
histogram metric. It takes the tenant name, Azure OpenAI Service name, method name, and the value to be observed as parameters.ObserveCompletionTokenHistogram
: Observes a value in thecompletionTokenHistogram
histogram metric. It takes the tenant name, Azure OpenAI Service name, method name, and the value to be observed as parameters.ObserveTotalTokenHistogram
: Observes a value in thetotalTokenHistogram
histogram metric. It takes the tenant name, Azure OpenAI Service name, method name, and the value to be observed as parameters.
AzureOpenAIService Class
The AzureOpenAIService
class implements the IAzureOpenAIService
custom interface. It provides functionality for invoking the chat completion API on a pool of Azure OpenAI Service resources using the Azure OpenAI client library for .NET library to invoke the Azure OpenAI Service REST API reference.
using Azure.AI.OpenAI;
using Azure.Core.Pipeline;
using OpenAiRestApi.Controllers;
using OpenAiRestApi.Utils;
using OpenAiRestApi.Options;
using Azure;
using Azure.Identity;
using System.Runtime.CompilerServices;
using SharpToken;
using OpenAiRestApi.Model;
namespace OpenAiRestApi.Services
{
public class AzureOpenAIService : IAzureOpenAIService
{
private readonly PrometheusMetrics _prometheusMetrics;
private readonly ILogger<OpenAIController> _logger;
private readonly PrometheusOptions _prometheusOptions;
private readonly AzureOpenAIOptions _azureOpenAiOptions;
private readonly Dictionary<string, string> _tenantAzureOpenAiMappings;
private readonly Dictionary<string, OpenAIClient> _openAIClients;
private readonly ChatCompletionsOptions _chatCompletionsOptions;
private int _roundRobinIndex = 0;
public AzureOpenAIService(ILogger<OpenAIController> logger,
PrometheusOptions prometheusOptions,
AzureOpenAIOptions azureOpenAiOptions,
Dictionary<string, string> tenantAzureOpenAiMappings,
ChatCompletionsOptions chatCompletionsOptions)
{
_prometheusMetrics = new PrometheusMetrics(prometheusOptions);
_logger = logger;
_prometheusOptions = prometheusOptions;
_azureOpenAiOptions = azureOpenAiOptions;
_tenantAzureOpenAiMappings = tenantAzureOpenAiMappings;
_openAIClients = new Dictionary<string, OpenAIClient>();
_chatCompletionsOptions = chatCompletionsOptions;
foreach (var service in _azureOpenAiOptions.Services.Keys)
{
OpenAIClientOptions options = new()
{
RetryPolicy = new RetryPolicy(maxRetries: Math.Max(0, _azureOpenAiOptions.Services[service].MaxRetries), new SequentialDelayStrategy()),
Diagnostics = { IsLoggingContentEnabled = true }
};
if (string.IsNullOrEmpty(_azureOpenAiOptions.Services[service].Endpoint))
{
_logger.LogError($"Azure OpenAI Service {service} endpoint is not configured.");
continue;
}
if (string.Compare(_azureOpenAiOptions.Services[service].Type, "azuread", true) == 0)
{
_openAIClients.Add(service, new OpenAIClient(new Uri(_azureOpenAiOptions.Services[service].Endpoint), new DefaultAzureCredential(), options));
_logger.LogInformation($"Azure OpenAI Service {service} is configured with Azure Microsoft Entra ID authentication.");
}
else
{
if (string.IsNullOrEmpty(_azureOpenAiOptions.Services[service].ApiKey))
{
_logger.LogError($"Azure OpenAI Service {service} API key is not configured.");
continue;
}
_openAIClients.Add(service, new OpenAIClient(new Uri(_azureOpenAiOptions.Services[service].Endpoint), new AzureKeyCredential(_azureOpenAiOptions.Services[service].ApiKey), options));
_logger.LogInformation($"Azure OpenAI Service {service} is configured with an API key authentication.");
}
}
}
public async Task<string> GetChatCompletionsAsync(string tenant, IEnumerable<Message> history, CancellationToken cancellationToken = default)
{
if (history?.Any() != true)
{
throw new ArgumentException("History cannot be null or empty.", nameof(history));
}
if (string.IsNullOrEmpty(tenant))
{
throw new ArgumentException("Tenant cannot be null or empty.", nameof(tenant));
}
var openAIName = GetOpenAIServiceName(tenant);
var openAIClient = _openAIClients[openAIName];
var openAIOptions = _azureOpenAiOptions.Services[openAIName];
_logger.LogInformation($"New request: method=[GetChatCompletionsAsync] tenant=[{tenant}] openai=[{openAIName}]");
var chatCompletionsOptions = new ChatCompletionsOptions
{
MaxTokens = _chatCompletionsOptions.MaxTokens.HasValue ? _chatCompletionsOptions.MaxTokens.Value : null,
Temperature = _chatCompletionsOptions.Temperature.HasValue ? _chatCompletionsOptions.Temperature.Value : null,
NucleusSamplingFactor = _chatCompletionsOptions.NucleusSamplingFactor.HasValue ? _chatCompletionsOptions.NucleusSamplingFactor.Value : null,
FrequencyPenalty = _chatCompletionsOptions.FrequencyPenalty.HasValue ? _chatCompletionsOptions.FrequencyPenalty.Value : null,
PresencePenalty = _chatCompletionsOptions.PresencePenalty.HasValue ? _chatCompletionsOptions.PresencePenalty.Value : null,
};
if (chatCompletionsOptions.StopSequences is { Count: > 0 })
{
foreach (var s in chatCompletionsOptions.StopSequences) { chatCompletionsOptions.StopSequences.Add(s); }
}
if (history.Count() == 1 || history.FirstOrDefault()?.Role != ChatRole.System)
{
chatCompletionsOptions.Messages.Add(new ChatMessage(ChatRole.System, _azureOpenAiOptions.SystemPrompt));
}
foreach (var message in history)
{
chatCompletionsOptions.Messages.Add(new ChatMessage(message.Role, message.Content));
}
var tokenNumber = TruncateHistory(openAIOptions.Model,
_chatCompletionsOptions.MaxTokens.HasValue ? _chatCompletionsOptions.MaxTokens.Value : 4096,
openAIOptions.MaxResponseTokens,
chatCompletionsOptions.Messages);
if (_prometheusOptions.Enabled)
{
}
var response = await openAIClient.GetChatCompletionsAsync(openAIOptions.Model, chatCompletionsOptions, cancellationToken).ConfigureAwait(false);
var result = response?.Value?.Choices?.FirstOrDefault()?.Message.Content ?? string.Empty;
if (_prometheusOptions.Enabled)
{
var promptTokens = response?.Value?.Usage?.PromptTokens != null ? (double)response?.Value?.Usage?.PromptTokens! : tokenNumber;
_prometheusMetrics.SetPromptTokenCount(tenant, openAIName, "chat", promptTokens);
_prometheusMetrics.IncPromptTokenTotal(tenant, openAIName, "chat", promptTokens);
_prometheusMetrics.ObservePromptTokenHistogram(tenant, openAIName, "chat", promptTokens);
var completionTokens = response?.Value?.Usage?.CompletionTokens != null ? (double)response?.Value?.Usage?.CompletionTokens! : GetTokenNumberFromString(openAIOptions.Model, result);
_prometheusMetrics.SetCompletionTokenCount(tenant, openAIName, "chat", completionTokens);
_prometheusMetrics.IncCompletionTokenTotal(tenant, openAIName, "chat", completionTokens);
_prometheusMetrics.ObserveCompletionTokenHistogram(tenant, openAIName, "chat", completionTokens);
var totalTokens = response?.Value?.Usage?.TotalTokens != null ? (double)response?.Value?.Usage?.TotalTokens! : promptTokens + completionTokens;
_prometheusMetrics.SetTotalTokenCount(tenant, openAIName, "chat", totalTokens);
_prometheusMetrics.IncTotalTokenTotal(tenant, openAIName, "chat", totalTokens);
_prometheusMetrics.ObserveTotalTokenHistogram(tenant, openAIName, "chat", totalTokens);
}
return result;
}
public async IAsyncEnumerable<string> GetChatCompletionsStreamingAsync(string tenant, IEnumerable<Message> history, [EnumeratorCancellation] CancellationToken cancellationToken = default)
{
if (history?.Any() != true)
{
throw new ArgumentException("History cannot be null or empty.", nameof(history));
}
if (string.IsNullOrEmpty(tenant))
{
throw new ArgumentException("Tenant cannot be null or empty.", nameof(tenant));
}
var openAIName = GetOpenAIServiceName(tenant);
var openAIClient = _openAIClients[openAIName];
var openAIOptions = _azureOpenAiOptions.Services[openAIName];
_logger.LogInformation($"New request: method=[GetChatCompletionsStreamingAsync] tenant=[{tenant}] openai=[{openAIName}]");
var chatCompletionsOptions = new ChatCompletionsOptions
{
MaxTokens = _chatCompletionsOptions.MaxTokens.HasValue ? _chatCompletionsOptions.MaxTokens.Value : null,
Temperature = _chatCompletionsOptions.Temperature.HasValue ? _chatCompletionsOptions.Temperature.Value : null,
NucleusSamplingFactor = _chatCompletionsOptions.NucleusSamplingFactor.HasValue ? _chatCompletionsOptions.NucleusSamplingFactor.Value : null,
FrequencyPenalty = _chatCompletionsOptions.FrequencyPenalty.HasValue ? _chatCompletionsOptions.FrequencyPenalty.Value : null,
PresencePenalty = _chatCompletionsOptions.PresencePenalty.HasValue ? _chatCompletionsOptions.PresencePenalty.Value : null,
};
if (chatCompletionsOptions.StopSequences is { Count: > 0 })
{
foreach (var s in chatCompletionsOptions.StopSequences) { chatCompletionsOptions.StopSequences.Add(s); }
}
if (history.Count() == 1 || history.FirstOrDefault()?.Role != ChatRole.System)
{
chatCompletionsOptions.Messages.Add(new ChatMessage(ChatRole.System, _azureOpenAiOptions.SystemPrompt));
}
foreach (var message in history)
{
chatCompletionsOptions.Messages.Add(new ChatMessage(message.Role, message.Content));
}
var promptTokens = TruncateHistory(openAIOptions.Model,
_chatCompletionsOptions.MaxTokens.HasValue ? _chatCompletionsOptions.MaxTokens.Value : 4096,
openAIOptions.MaxResponseTokens,
chatCompletionsOptions.Messages);
if (_prometheusOptions.Enabled)
{
_prometheusMetrics.SetPromptTokenCount(tenant, openAIName, "stream", promptTokens);
_prometheusMetrics.IncPromptTokenTotal(tenant, openAIName, "stream", promptTokens);
_prometheusMetrics.ObservePromptTokenHistogram(tenant, openAIName, "stream", promptTokens);
}
var response = await openAIClient.GetChatCompletionsStreamingAsync(openAIOptions.Model, chatCompletionsOptions, cancellationToken).ConfigureAwait(false);
var completionTokens = 0;
await foreach (var choice in response.Value.GetChoicesStreaming(cancellationToken))
{
await foreach (var message in choice.GetMessageStreaming(cancellationToken))
{
var result = message.Content ?? string.Empty;
if (_prometheusOptions.Enabled)
{
completionTokens += GetTokenNumberFromString(openAIOptions.Model, result);
}
yield return result;
}
}
if (_prometheusOptions.Enabled)
{
_prometheusMetrics.SetCompletionTokenCount(tenant, openAIName, "stream", completionTokens);
_prometheusMetrics.IncCompletionTokenTotal(tenant, openAIName, "stream", completionTokens);
_prometheusMetrics.ObserveCompletionTokenHistogram(tenant, openAIName, "stream", completionTokens);
var totalTokens = promptTokens + completionTokens;
_prometheusMetrics.SetTotalTokenCount(tenant, openAIName, "stream", totalTokens);
_prometheusMetrics.IncTotalTokenTotal(tenant, openAIName, "stream", totalTokens);
_prometheusMetrics.ObserveTotalTokenHistogram(tenant, openAIName, "stream", totalTokens);
}
}
private string GetOpenAIServiceName(string tenant)
{
string openAIServiceName;
if (_tenantAzureOpenAiMappings.ContainsKey(tenant) &&
_openAIClients.ContainsKey(_tenantAzureOpenAiMappings[tenant]))
{
openAIServiceName = _tenantAzureOpenAiMappings[tenant];
_logger.LogInformation($"Tenant {tenant} is mapped to {openAIServiceName} Azure OpenAI Service .");
return openAIServiceName;
}
else
{
openAIServiceName = _openAIClients.Keys.ElementAt(_roundRobinIndex);
_logger.LogInformation($"{openAIServiceName} Azure OpenAI Service was assigned to tenant {tenant} by round robin policy.");
_roundRobinIndex = (_roundRobinIndex + 1) % _openAIClients.Count;
return openAIServiceName;
}
}
private int GetTokenNumberFromString(string model, string message)
{
return GptEncoding.GetEncodingForModel(model).Encode(message).Count;
}
private int GetTokenNumberFromMessages(string model, IList<ChatMessage> messages)
{
var encoding = GptEncoding.GetEncodingForModel(model);
int numTokens = 0;
foreach (var message in messages)
{
numTokens += 4; // Every message follows <im_start>{role/name}\n{content}<im_end>\n
numTokens += encoding.Encode(message.Role.ToString()).Count;
numTokens += encoding.Encode(message.Content).Count;
}
numTokens += 2; // Every reply is primed with <im_start>assistant
return numTokens;
}
private int TruncateHistory(string model, int maxTokens, int maxResponseTokens, IList<ChatMessage> messages)
{
int historyTokenNumber = GetTokenNumberFromMessages(model, messages);
while (historyTokenNumber + maxResponseTokens >= maxTokens)
{
messages.RemoveAt(1);
historyTokenNumber = GetTokenNumberFromMessages(model, messages);
}
return historyTokenNumber;
}
}
}
The class implements the following public methods defined in the IAzureOpenAIService
interface:
- The
GetChatCompletionsAsync
method takes in a tenant, a list of chat history messages, and an optional cancellation token. It performs necessary validations on the input parameters, retrieves the appropriate Azure OpenAI Service based on the tenant name, and then generates chat completions using the OpenAIClient.GetChatCompletionsAsync method from the Azure OpenAI client library for .NET. It also records Prometheus metrics if enabled and returns the generated chat completion as a string. This method retrieves the number of prompt tokens, completion tokens, and total tokens from the Usage property of the ChatCompletions result. - The
GetChatCompletionsStreamingAsync
method is similar to the previous method but allows for streaming the chat completions. It uses the OpenAIClient.GetChatCompletionsStreamingAsync method from the Azure OpenAI client library for .NET to generate chat completions and yields each completion as a string in an asynchronous enumerable stream. This method uses the SharpToken C# library to calculate the number of promot, completion, and total tokens. SharpToken is a C# port of the Python tiktoken library.
There are also private methods within the class. The GetOpenAIServiceName
method retrieves the name of the Azure OpenAI Service to be used in the current call. It follows this logic:
- If there is a specific mapping for the current tenant, the Azure OpenAI Service specified in the mapping is utilized.
- In the absence of a mapping, a round-robin scheduling algorithm is employed to select an Azure OpenAI Service from the pool.
OpenAIController
The OpenAIController
class is an ASP.NET Core controller that handles HTTP requests to the Azure OpenAI Service from multiple tenants.
using Microsoft.AspNetCore.Mvc;
using System.Runtime.CompilerServices;
using OpenAiRestApi.Model;
using OpenAiRestApi.Services;
namespace OpenAiRestApi.Controllers;
[ApiController]
[Route("openai")]
public class OpenAIController : ControllerBase
{
private readonly ILogger<OpenAIController> _logger;
private readonly AzureOpenAIService _azureOpenAIService;
public OpenAIController(ILogger<OpenAIController> logger, AzureOpenAIService azureOpenAIService)
{
_logger = logger;
_azureOpenAIService = azureOpenAIService;
}
/// <summary>
/// Returns a string that includes both the tenant and value provided as parameters.
/// </summary>
/// <param name="tenant">Specifies the tenant name</param>
/// <param name="value">Specifies a value</param>
/// <returns>A a string that includes both the tenant and value provided as parameters</returns>
/// <response code="200">Success</response>
/// <response code="400">Bad Request</response>
[HttpGet]
[Route("echo")]
[ProducesResponseType(StatusCodes.Status200OK)]
[ProducesResponseType(StatusCodes.Status400BadRequest)]
public IActionResult Echo(string? tenant, int value)
{
// Validate the tenant parameter
if (string.IsNullOrEmpty(tenant))
{
return BadRequest("Tenant cannot be null or empty.");
}
// Normalize the tenant name
tenant = tenant.ToLower();
// Log the request
_logger.LogInformation($"Echo called with tenant = {tenant} and value = {value}");
// Set the content type
Response.Headers.ContentType = "text/plain";
// Completion metrics
return Ok($"tenant: {tenant} value: {value}");
}
/// <summary>
/// Returns a completion prompt.
/// </summary>
/// <param name="tenant">Specifies the tenant name</param>
/// <param name="conversation">Specifies a collection of messages representing the history</param>
/// <returns>A completion prompt</returns>
/// <remarks>
/// Sample request:
///
/// POST /openai/chat
/// [
/// {
/// "role": "user",
/// "content": "Tell me about Milan"
/// }
/// ]
///
/// </remarks>
/// <response code="200">Success</response>
/// <response code="400">Bad Request</response>
[HttpPost]
[Route("chat")]
[ProducesResponseType(StatusCodes.Status200OK)]
[ProducesResponseType(StatusCodes.Status400BadRequest, Type = typeof(Task<IActionResult>))]
public async Task<IActionResult> GetChatCompletionsAsync(string tenant,[FromBody] IEnumerable<Message> conversation)
{
try
{
// Validate the tenant parameter
if (string.IsNullOrEmpty(tenant))
{
return BadRequest("Tenant cannot be null or empty.");
}
// Log the response
_logger.LogInformation($"GetChatCompletionsAsync call by {tenant.ToLower()} tenant processing...");
var result = await _azureOpenAIService.GetChatCompletionsAsync(tenant, conversation);
// Log the request
_logger.LogInformation($"GetChatCompletionsAsync call by {tenant.ToLower()} tenant successfully completed.");
// Set the content type
Response.Headers.ContentType = "text/plain";
return Ok(result);
}
catch (Exception ex)
{
// Create the error message
var errorMessage = $"GetChatCompletionsAsync call by {tenant.ToLower()} tenant failed: {ex.Message}.";
// Log the error
_logger.LogError(errorMessage);
// Return the error
return BadRequest(errorMessage);
}
}
/// <summary>
/// Begins a chat completions request and get an object that can stream response data as it becomes available.
/// </summary>
/// <param name="tenant">Specifies the tenant name</param>
/// <param name="conversation">Specifies a collection of messages representing the history</param>
/// <param name="cancellationToken">Cancellation token</param>
/// <returns>A streaming completion prompt</returns>
/// <remarks>
/// Sample request:
///
/// POST /openai/chat
/// [
/// {
/// "role": "user",
/// "content": "Tell me about Milan"
/// }
/// ]
///
/// </remarks>
/// <response code="200">Success</response>
/// <response code="400">Bad Request</response>
[HttpPost]
[Route("stream")]
public async IAsyncEnumerable<string> GetChatCompletionsStreamingAsync(string tenant, [FromBody] IEnumerable<Message> conversation, [EnumeratorCancellation]CancellationToken cancellationToken = default)
{
// Validate the tenant parameter
if (string.IsNullOrEmpty(tenant))
{
throw new ArgumentNullException(nameof(tenant), "Tenant cannot be null or empty.");
}
// Log the response
_logger.LogInformation($"GetChatCompletionsStreamingAsync call by {tenant.ToLower()} tenant processing...");
// Return the result
await foreach (var completion in _azureOpenAIService.GetChatCompletionsStreamingAsync(tenant, conversation, cancellationToken))
{
yield return completion;
await Task.Delay(1);
}
}
}
The class exposes the following web methods:
Echo
: An HTTP GET request handler that returns a string containing both the tenant and value provided as parameters. This is a test method.GetChatCompletionsAsync
: An HTTP POST request handler that calls theGetChatCompletionsAsync
method of theAzureOpenAIService
singleton and returns a completion prompt based on the provided chat conversation.GetChatCompletionsStreamingAsync
: An HTTP POST request handler that initiates a chat completions request and returns a streaming response containing completion prompts as they become available. This method theGetChatCompletionsStreamingAsync
method of theAzureOpenAIService
singleton and is an asynchronous iterator that yields completion prompts.
The public methods are annotated with attributes to specify the URL routes, expected HTTP response codes, and data types. These attributes include [HttpGet]
, [HttpPost]
, [Route]
, [ProducesResponseType]
, and [FromBody]
.
OpenAIGrpcService Class
The OpenAIGrpcService
class defines a gRPC service that leverages the functionality of the AzureOpenAIService
singleton to handle Azure OpenAI calls from multiple tenants.
using Grpc.Core;
using OpenAiRestApi.Model;
namespace OpenAiRestApi.Services;
public class OpenAIGrpcService : OpenAIServiceGrpc.OpenAIServiceGrpcBase, IOpenAIGrpcService
{
private readonly ILogger<OpenAIGrpcService> _logger;
private readonly AzureOpenAIService _azureOpenAIService;
public OpenAIGrpcService(ILogger<OpenAIGrpcService> logger, AzureOpenAIService azureOpenAIService)
{
_logger = logger;
_azureOpenAIService = azureOpenAIService;
}
public override Task<EchoResponse> Echo(EchoRequest request, ServerCallContext context)
{
// Validate the tenant parameter
if (string.IsNullOrEmpty(request.Tenant))
{
throw new RpcException(new Status(StatusCode.InvalidArgument, "Tenant cannot be null or empty."));
}
// Normalize the tenant name
var tenant = request.Tenant.ToLower();
// Log the request
_logger.LogInformation($"Echo called with tenant = {tenant} and value = {request.Value}");
// Set the content type
context.ResponseTrailers.Add("Content-Type", "text/plain");
// Completion metrics
var response = new EchoResponse
{
Message = $"tenant: {tenant} value: {request.Value}"
};
return Task.FromResult(response);
}
public override async Task<GetChatCompletionsResponse> GetChatCompletions(GetChatCompletionsRequest request, ServerCallContext context)
{
// Validate the tenant parameter
if (string.IsNullOrEmpty(request.Tenant))
{
throw new RpcException(new Status(StatusCode.InvalidArgument, "Tenant cannot be null or empty."));
}
try
{
// Log the response
_logger.LogInformation($"GetChatCompletions call by {request.Tenant.ToLower()} tenant processing...");
// Convert the request to a list of messages
var messages = request.Conversation.Select(message => new Message
{
Role = message.Role,
Content = message.Content
}).ToList();
var result = await _azureOpenAIService.GetChatCompletionsAsync(request.Tenant, messages);
// Log the request
_logger.LogInformation($"GetChatCompletions call by {request.Tenant.ToLower()} tenant successfully completed.");
var response = new GetChatCompletionsResponse
{
Result = result
};
return response;
}
catch (Exception ex)
{
// Create the error message
var errorMessage = $"GetChatCompletions call by {request.Tenant.ToLower()} tenant failed: {ex.Message}.";
// Log the error
_logger.LogError(errorMessage);
throw new RpcException(new Status(StatusCode.Internal, errorMessage));
}
}
public override async Task GetChatCompletionsStreaming(GetChatCompletionsRequest request, IServerStreamWriter<GetChatCompletionsStreamingResponse> responseStream, ServerCallContext context)
{
// Validate the tenant parameter
if (string.IsNullOrEmpty(request.Tenant))
{
throw new RpcException(new Status(StatusCode.InvalidArgument, "Tenant cannot be null or empty."));
}
try
{
// Log the response
_logger.LogInformation($"GetChatCompletionsStreaming call by {request.Tenant.ToLower()} tenant processing...");
// Convert the request to a list of messages
var messages = request.Conversation.Select(message => new Message
{
Role = message.Role,
Content = message.Content
}).ToList();
await foreach (var item in _azureOpenAIService.GetChatCompletionsStreamingAsync(request.Tenant, messages, context.CancellationToken))
{
var response = new GetChatCompletionsStreamingResponse
{
Message = item
};
await responseStream.WriteAsync(response);
}
}
catch (Exception ex)
{
// Create the error message
var errorMessage = $"GetChatCompletionsStreaming call by {request.Tenant.ToLower()} tenant failed: {ex.Message}.";
// Log the error
_logger.LogError(errorMessage);
throw new RpcException(new Status(StatusCode.Internal, errorMessage));
}
}
}
The class implements the OpenAIServiceGrpc.OpenAIServiceGrpcBase
base class, which is generated from the following OpenAI.proto
file.
syntax = "proto3";
option csharp_namespace = "OpenAiRestApi.Services";
package OpenAiRestApi;
service OpenAIServiceGrpc {
rpc Echo(EchoRequest) returns (EchoResponse) {}
rpc GetChatCompletions(GetChatCompletionsRequest) returns (GetChatCompletionsResponse) {}
rpc GetChatCompletionsStreaming(GetChatCompletionsRequest) returns (stream GetChatCompletionsStreamingResponse) {}
}
message EchoRequest {
string tenant = 1;
int32 value = 2;
}
message EchoResponse {
string message = 1;
}
message GetChatCompletionsRequest {
string tenant = 1;
repeated GrpcMessage conversation = 2;
}
message GetChatCompletionsResponse {
string result = 1;
}
message GetChatCompletionsStreamingRequest {
string tenant = 1;
repeated GrpcMessage conversation = 2;
}
message GetChatCompletionsStreamingResponse {
string message = 1;
}
message GrpcMessage {
string role = 1;
string content = 2;
}
The class exposes the following public methods:
Echo
: Implements the gRPCEcho
service method, which returns anEchoResponse
containing a message that combines the tenant and value from the request. This is a test method.GetChatCompletions
: Implements the gRPCGetChatCompletions
service method, which returns aGetChatCompletionsResponse
containing the result of the call toGetChatCompletionsAsync
method of theAzureOpenAIService
singleton.GetChatCompletionsStreaming
: Implements the gRPCGetChatCompletionsStreaming
service method, which streamsGetChatCompletionsStreamingResponse
objects to the client using theresponseStream
object. It calls theGetChatCompletionsStreamingAsync
method of theAzureOpenAIService
singleton and iterates over the results to write each response to the stream.
Note that the AzureOpenAIService
object _azureOpenAIService
is injected into the OpenAIGrpcService
class via the constructor. The methods in OpenAIGrpcService
then call the respective methods of _azureOpenAIService
to perform the actual logic and retrieve the data from the Azure OpenAI Service.
For more information on how to build a gRPC service in C#, see the following articles:
- Tutorial: Create a gRPC client and server in ASP.NET Core
- View or download the completed sample code for this tutorial (how to download).
- Overview for gRPC on .NET
- gRPC services with C#
- Migrate gRPC from C-core to gRPC for .NET
Client Application
The solution provides a C# console application that you can use as a client to test the service via both REST and gRPC protocols.
The console application allows the user to interactively select and execute different test methods, such as Echo
, Chat
, and Stream
, using either REST or gRPC.
using System.Text.Json;
using System.Globalization;
using System.Runtime.CompilerServices;
using Microsoft.Extensions.Configuration;
using Microsoft.Extensions.Logging;
using System.Net.Http.Headers;
using System.Text;
using OpenAiRestApi.Services;
using Grpc.Net.Client;
using Grpc.Core;
using Newtonsoft.Json.Linq;
using System.IO;
namespace OpenAiRestApi.Client;
class Program
{
...
public static async Task Main()
{
try
{
// Initialization
CreateConfiguration();
CreateLoggerFactory();
SelectEnvironment();
int i;
// Execute commands
while ((i = SelectTest()) != _testList.Count + 1)
{
try
{
WriteLine(Line);
await _testList[i - 1]!.ActionAsync();
WriteLine(Line);
}
catch (Exception ex)
{
PrintException(ex);
}
}
}
catch (Exception ex)
{
PrintException(ex);
WriteLine("Press any key to exit");
Console.ReadKey();
}
}
private static async Task RestEcho()
{
// Enter a tenant
Console.Write("Enter a tenant: ");
var tenant = Console.ReadLine();
// Validate the tenant
if (string.IsNullOrWhiteSpace(tenant))
{
Console.WriteLine("The tenant cannot be empty.");
return;
}
// Enter an integer value
Console.Write("Enter an integer value: ");
var valueAsString = Console.ReadLine();
// Validate the question
int value;
if (!int.TryParse(valueAsString, out value))
{
Console.WriteLine("The value cannot be empty.");
return;
}
// Create a client
using HttpClient httpClient = new HttpClient();
var uri = new Uri($"{_restServiceUrl}/openai/echo?tenant={tenant}&value={value}");
var requestMessage = new HttpRequestMessage(HttpMethod.Get, uri);
httpClient.DefaultRequestHeaders.Accept.Add(new MediaTypeWithQualityHeaderValue("application/json"));
// Call the Echo method
var response = await httpClient.SendAsync(requestMessage);
if (response.IsSuccessStatusCode)
{
// Process the response
Console.Write("Result: ");
var stream = await response.Content.ReadAsStreamAsync();
using StreamReader reader = new StreamReader(stream);
while (!reader.EndOfStream)
{
var line = await reader.ReadLineAsync();
Console.WriteLine(line);
}
}
else
{
// Handle the error scenario
PrintMessage("Error", $"Request failed with status code {response.StatusCode}");
}
}
private static async Task RestChat()
{
// Enter a tenant
Console.Write("Enter a tenant: ");
var tenant = Console.ReadLine();
// Validate the tenant
if (string.IsNullOrWhiteSpace(tenant))
{
Console.WriteLine("The tenant cannot be empty.");
return;
}
// Enter a question
Console.Write("Enter a question: ");
var question = Console.ReadLine();
// Validate the question
if (string.IsNullOrWhiteSpace(question))
{
Console.WriteLine("The question cannot be empty.");
return;
}
// Create a message
var message = new Message
{
Role = "user",
Content = question
};
// Serialize an array of messages
var json = JsonSerializer.Serialize(new Message[] { message });
// Create a client
using HttpClient httpClient = new HttpClient();
// Create content
var content = new StringContent(json, Encoding.UTF8, "application/json");
// Add the tenant header
content.Headers.Add("x-tenant", tenant);
// Call the Stream method
var response = await httpClient.PostAsync($"{_restServiceUrl}/openai/chat", content);
if (response.IsSuccessStatusCode)
{
// Process the response
var stream = await response.Content.ReadAsStreamAsync();
using StreamReader reader = new StreamReader(stream);
while (!reader.EndOfStream)
{
var line = await reader.ReadLineAsync();
Console.WriteLine(line);
}
}
else
{
// Handle the error scenario
PrintMessage("Error", $"Request failed with status code {response.StatusCode}");
}
}
static private async Task RestStream()
{
// Enter a tenant
Console.Write("Enter a tenant: ");
var tenant = Console.ReadLine();
// Validate the tenant
if (string.IsNullOrWhiteSpace(tenant))
{
Console.WriteLine("The tenant cannot be empty.");
return;
}
// Enter a question
Console.Write("Enter a question: ");
var question = Console.ReadLine();
// Validate the question
if (string.IsNullOrWhiteSpace(question))
{
Console.WriteLine("The question cannot be empty.");
return;
}
// Create a message
var message = new Message
{
Role = "user",
Content = question
};
// Serialize an array of messages
var json = JsonSerializer.Serialize(new Message[] { message });
// Create content
var content = new StringContent(json, Encoding.UTF8, "application/json");
// Add the tenant header
content.Headers.Add("x-tenant", tenant);
// Create a client
using HttpClient httpClient = new();
// Set the request headers
httpClient.DefaultRequestHeaders.Accept.Clear();
httpClient.DefaultRequestHeaders.Accept.Add(new MediaTypeWithQualityHeaderValue("application/json"));
httpClient.Timeout = TimeSpan.FromMinutes(5);
httpClient.DefaultRequestHeaders.Add("Accept", "application/json");
// Create request message
using var request = new HttpRequestMessage(HttpMethod.Post, $"{_restServiceUrl}/openai/stream")
{
Content = content
};
// use in-memory data
using HttpResponseMessage response = await httpClient.SendAsync(request, HttpCompletionOption.ResponseHeadersRead).ConfigureAwait(false);
response.EnsureSuccessStatusCode();
using Stream responseStream = await response.Content.ReadAsStreamAsync().ConfigureAwait(false);
await foreach (var token in JsonSerializer.DeserializeAsyncEnumerable<string>(responseStream, new JsonSerializerOptions { PropertyNameCaseInsensitive = true }))
{
if (string.IsNullOrEmpty(token))
{
continue;
}
Console.Write(token);
// Simulate a delay
if (string.Compare(_environment, DevelopmentEnvironment, StringComparison.OrdinalIgnoreCase) == 0)
{
await Task.Delay(_delay);
}
}
Console.WriteLine();
}
private static async Task GrpcEcho()
{
// Enter a tenant
Console.Write("Enter a tenant: ");
var tenant = Console.ReadLine();
// Validate the tenant
if (string.IsNullOrWhiteSpace(tenant))
{
Console.WriteLine("The tenant cannot be empty.");
return;
}
// Enter an integer value
Console.Write("Enter an integer value: ");
var valueAsString = Console.ReadLine();
// Validate the question
int value;
if (!int.TryParse(valueAsString, out value))
{
Console.WriteLine("The value cannot be empty.");
return;
}
// Check if the gRPC service is running on HTTP
var uri = new Uri(_grpcServiceUrl);
if (uri.Scheme == "http")
{
// This switch must be set before creating the GrpcChannel/HttpClient.
AppContext.SetSwitch("System.Net.Http.SocketsHttpHandler.Http2UnencryptedSupport", true);
}
// Create a channel
var channel = GrpcChannel.ForAddress(_grpcServiceUrl);
// Create a client
var client = new OpenAIServiceGrpc.OpenAIServiceGrpcClient(channel);
// Create a request
var request = new EchoRequest
{
Tenant = tenant,
Value = value
};
// Call the Echo method
var response = await client.EchoAsync(request);
// Print the response
Console.WriteLine($"Result: {response.Message}");
// Shutdown the gRPC channel
channel.ShutdownAsync().Wait();
}
private static async Task GrpcChat()
{
// Enter a tenant
Console.Write("Enter a tenant: ");
var tenant = Console.ReadLine();
// Validate the tenant
if (string.IsNullOrWhiteSpace(tenant))
{
Console.WriteLine("The tenant cannot be empty.");
return;
}
// Enter a question
Console.Write("Enter a question: ");
var question = Console.ReadLine();
// Validate the question
if (string.IsNullOrWhiteSpace(question))
{
Console.WriteLine("The question cannot be empty.");
return;
}
// Create a request
var request = new GetChatCompletionsRequest()
{
Tenant = tenant
};
// Create a message
request.Conversation.Add(new GrpcMessage
{
Role = "user",
Content = question
});
// Check if the gRPC service is running on HTTP
var uri = new Uri(_grpcServiceUrl);
if (uri.Scheme == "http")
{
// This switch must be set before creating the GrpcChannel/HttpClient.
AppContext.SetSwitch("System.Net.Http.SocketsHttpHandler.Http2UnencryptedSupport", true);
}
// Create a channel
var channel = GrpcChannel.ForAddress(_grpcServiceUrl);
// Create a client
var client = new OpenAIServiceGrpc.OpenAIServiceGrpcClient(channel);
// Call the GetChatCompletions method
var response = await client.GetChatCompletionsAsync(request);
// Print the response
if (response != null)
{
Console.WriteLine(response.Result);
}
// Shutdown the gRPC channel
channel.ShutdownAsync().Wait();
}
static private async Task GrpcStream()
{
// Enter a tenant
Console.Write("Enter a tenant: ");
var tenant = Console.ReadLine();
// Validate the tenant
if (string.IsNullOrWhiteSpace(tenant))
{
Console.WriteLine("The tenant cannot be empty.");
return;
}
// Enter a question
Console.Write("Enter a question: ");
var question = Console.ReadLine();
// Validate the question
if (string.IsNullOrWhiteSpace(question))
{
Console.WriteLine("The question cannot be empty.");
return;
}
// Create a request
var request = new GetChatCompletionsRequest()
{
Tenant = tenant
};
// Create a message
request.Conversation.Add(new GrpcMessage
{
Role = "user",
Content = question
});
// Check if the gRPC service is running on HTTP
var uri = new Uri(_grpcServiceUrl);
if (uri.Scheme == "http")
{
// This switch must be set before creating the GrpcChannel/HttpClient.
AppContext.SetSwitch("System.Net.Http.SocketsHttpHandler.Http2UnencryptedSupport", true);
}
// Create a channel
var channel = GrpcChannel.ForAddress(_grpcServiceUrl);
// Create a client
var client = new OpenAIServiceGrpc.OpenAIServiceGrpcClient(channel);
// Make the gRPC streaming call
using (var streamingCall = client.GetChatCompletionsStreaming(request))
{
// Read the streaming response
while (await streamingCall.ResponseStream.MoveNext())
{
var response = streamingCall.ResponseStream.Current;
// Process the response message
Console.Write(response.Message);
// Simulate a delay
if (string.Compare(_environment, DevelopmentEnvironment, StringComparison.OrdinalIgnoreCase) == 0)
{
await Task.Delay(_delay);
}
}
Console.WriteLine();
}
// Shutdown the gRPC channel
channel.ShutdownAsync().Wait();
}
...
}
The application is structured as follows:
- The
Main
method serves as the entry point of the application. It initializes the configuration, creates the logger factory, and prompts the user to select an environment and test. - The application supports multiple environments, each defined in the
appsettings.json
file. The environment configuration includes the URLs for the REST and gRPC services. - The class provides methods responsible for testing the service via REST protocol. These methods perform HTTP requests and handle the responses accordingly.
- The class also contains methods responsible for testing the service via gRPC protocol. These methods create gRPC channels and clients, make the gRPC service calls, and process the responses.
- The application provides interactive prompts for the user to enter inputs such as tenant, value, question, etc.
- The application utilizes the
Configuration
andLogger
frameworks to read configuration settings and log messages. - The
PrintException
and related methods are used for printing exception messages with proper formatting. - The application uses the HttpClient class for making REST requests and the GrpcChannel class for making gRPC requests.
Test the Application via Postman
You can use also use Postman, a popular API testing tool, to test the service. Postman allows you to send requests to the service locally or in the cloud using both REST and gRPC protocols.
Here are some benefits of using Postman for testing:
- User-Friendly Interface: Postman provides an intuitive and user-friendly interface for composing and sending requests. You can easily set headers, query parameters, and request bodies.
- Test Collections: Postman allows you to organize your requests into collections, enabling you to group related requests together for easier management and execution.
- Pre-built Request Templates: Postman provides pre-built request templates for common actions like GET, POST, PUT, and DELETE. This can save you time when setting up your requests.
- Response Validation: Postman provides powerful response validation capabilities, allowing you to define assertions and conditions to ensure that the responses meet your expected criteria.
You can proceed as follows to test the service using Postman:
- Download and install Postman if you haven't already.
- Create a new request in Postman by specifying the request method, URL, headers, and parameters.
- For REST requests, you can use the Azure OpenAI service REST API endpoint and add the required headers and parameters.
- For gRPC requests, you'll need to generate the gRPC client code and use it in your Postman request. Refer to the provided articles for detailed instructions on how to test gRPC services using Postman.
- Send the request and inspect the response. Use the response validation features in Postman to ensure the response meets your expectations.
By leveraging Postman's capabilities, you can streamline and automate your testing process, enabling you to efficiently test and interact with the Azure OpenAI service. For more information on how to use Postman to test a gRPC service, see the following articles:
Test the application using a Bash script
You can use the script/test.sh
script to simulate multiple tenants and generate traffic against the REST API of the service.
#!/bin/bash
# Print the menu
echo "===================================="
echo "Choose a target environment (1-3): "
echo "===================================="
options=(
"Development"
"Production"
"Quit"
)
name=""
# Select an option
COLUMNS=0
select option in "${options[@]}"; do
case $option in
"Development")
url='http://localhost:8000/openai/chat'
break
;;
"Production")
url='https://alphaopenaihttp.babosbird.com/openai/chat'
break
;;
"Quit")
exit
;;
*) echo "invalid option $REPLY" ;;
esac
done
# Function to escape special characters in a JSON string
escape_json() {
local json_string="$1"
local escaped_string
# Perform the necessary escaping of special characters
escaped_string=$(echo "$json_string" | sed -e 's/\\/\\\\/g' -e 's/"/\\"/g' -e 's/\//\\\//g' -e 's/\n/\\n/g' -e 's/\r/\\r/g' -e 's/\t/\\t/g')
# Return the escaped JSON string
echo "$escaped_string"
}
# Array of tenants
tenants=("contoso" "fabrikam" "acme")
cities=("Pisa" "Lucca" "Florence" "Siena" "Carrara" "Livorno" "Prato" "Arezzo" "Massa" "Pistoia" "Grosseto" "Milano" "Monza" "Brescia" "Bergamo" "Mantova" "Parma")
questions=("tell me about the city" "could you please tell me more about the history of this city? Tell me about famous people who were born in the city." "I'd like to make a tour of the city, can you suggest any other nice town or historical place to visit nearby?")
# Configurable amount of calls to send
amount_of_calls=1000
# Loop through the amount of calls to send
for ((i = 0; i < $amount_of_calls; i++)); do
# Get a random tenant from the array
rand_tenant=${tenants[$((RANDOM % ${#tenants[@]}))]}
# Get a random city from the array
rand_city=${cities[$((RANDOM % ${#cities[@]}))]}
# Get a random question from the array
rand_question="Regarding $rand_city, ${questions[$((RANDOM % ${#questions[@]}))]}"
# Prepare the request body
request_body='[
{
"role": "system",
"content": "The assistant is helpful, creative, clever, and very friendly."
},
{
"role": "user",
"content": "'$rand_question'"
}
]'
# Send the POST request and capture the response status code
curl -X 'POST' \
-s \
-o /dev/null \
-H 'accept: */*' \
-H 'Content-Type: application/json' \
-H "X-Tenant: $rand_tenant" \
-w "tenant: $rand_tenant city: $rand_city question: $rand_question $statusCode: %{http_code}\n" \
-d "$request_body" \
$url
done
Docker Compose
The solution leverages Docker Compose and Docker Desktop to start and test the service in a container along with a local instance of Prometheus and Grafana.
The solution utilizesDocker Compose and Docker Desktop to facilitate the testing of the service within a containerized environment. Additionally, it includes a local instance of Prometheus and Grafana for monitoring and visualization purposes. In addition to the docker-compose.yml
used by the Visual Studio solution, you can find the following docker-compose.yml
in the scripts
folder to start the service, Prometheus, and Grafana containers outside of the Visual Studio environment.
version: '3.4'
networks:
openai:
external: true
services:
openairestapi:
build:
context: ../openairestapi
dockerfile: Dockerfile
environment:
- ASPNETCORE_ENVIRONMENT=Development
ports:
- "8000:80"
- "8002:6000"
networks:
- openai
prometheus:
build:
context: ./prometheus
depends_on:
- openairestapi
ports:
- 9090:9090
networks:
- openai
grafana:
build:
context: ./grafana
depends_on:
- prometheus
ports:
- 3000:3000
networks:
- openai
With Docker Compose, you can define and manage multi-container applications as a single unit. It allows you to specify the service configurations, dependencies, and networking requirements within a YAML file. Docker Desktop provides the necessary runtime environment to run the containers on your development machine.
By using Docker Compose and Docker Desktop, you can easily start and test the service in an isolated environment with all the necessary dependencies. This approach ensures consistency and reproducibility across different environments. It also simplifies the setup process and avoids any conflicts or compatibility issues that may arise when running the service and its dependencies on the local machine. For more information on how to use Docker Compose to test a multi-container application in Visual Studio, see Tutorial: Create a multi-container app with Docker Compose.
Prometheus and Grafana are included in the Docker Compose configuration to provide monitoring and visualization capabilities. Prometheus is an open-source monitoring system that collects and stores metrics from various sources, while Grafana is a feature-rich analytics and visualization platform. Together, they allow you to monitor the performance and health of the service in real-time and create customizable dashboards and visualizations.
By leveraging Docker Compose, Docker Desktop, Prometheus, and Grafana, you can easily start and test the service, monitor its metrics, and analyze its performance and behavior. This comprehensive solution provides a streamlined and efficient testing and monitoring process for the service.
Deploy the Application to Azure Kubernetes Service (AKS)
You can use the YAML manifests and scripts in the scripts\aks
folder to deploy the application to your Azure Kubernetes Service (AKS) cluster and expose the service via REST and gRPC protocols. Let's see the YAML manifests and main scripts:
Define a Value for Variables
Before running any script, make sure to customize the value of the variables inside the 00-variables.sh
file. This file is embedded in all the scripts and contains the following variables:
#!/bin/bash
# Azure Subscription and Tenant
subscriptionId=$(az account show --query id --output tsv)
subscriptionName=$(az account show --query name --output tsv)
tenantId=$(az account show --query tenantId --output tsv)
# Azure Kubernetes Service
aksClusterName="TanAks"
aksResourceGroupName="TanRG"
# Azure Key Vault
keyVaultName="TanKeyVault"
keyVaultSku="Standard"
# Azure Container Registry
acrName="TanAcr"
# Azure Resources
location="EastUS"
resourceGroupName="TanRG"
# Azure Managed Identity
managedIdentityName="TanWorkloadManagedIdentity"
# Test Service Principal
servicePrincipalName="OpenAiTestServicePrincipal"
# Container Images
containerImageName="openaiservice"
containerImageTag="v1"
image="${acrName,,}.azurecr.io/$containerImageName:$containerImageTag"
imagePullPolicy="IfNotPresent" # Always, Never, IfNotPresent
# Kubernetes Service account
namespace="openai"
serviceAccountName="openai-sa"
# Variables for the federated identity name
federatedIdentityName="OpenAiWorkloadFederatedIdentity"
# Azure OpenAI Service
openAiNames=("BetaOpenAI" "AlphaOpenAI")
openAiResourceGroupNames=("TanRG" "BaboRG")
# NGINX
nginxNamespace="ingress-basic"
nginxRepoName="ingress-nginx"
nginxRepoUrl="https://kubernetes.github.io/ingress-nginx"
nginxChartName="ingress-nginx"
nginxReleaseName="nginx-ingress"
nginxReplicaCount=3
# Azure DNS
dnsZoneName="babosbird.com"
dnsZoneResourceGroupName="dnsresourcegroup"
httpSubdomain="BetaOpenAIhttp"
grpcSubdomain="BetaOpenAIgrpc"
# Certificate Manager
certManagerNamespace="cert-manager"
certManagerRepoName="jetstack"
certManagerRepoUrl="https://charts.jetstack.io"
certManagerChartName="cert-manager"
certManagerReleaseName="cert-manager"
email="paolos@microsoft.com"
clusterIssuer="letsencrypt-nginx"
template="cluster-issuer.yml"
# Default Backend
defaultBackendTemplate="default-backend.yml"
# Templates
serviceTemplate="service.yml"
keyVaultDeploymentTemplate="deployment-keyvault.yml"
appSettingsDeploymentTemplate="deployment-appsettings.yml"
keyVaultConfigMapTemplate="configmap-keyvault.yml"
appSettingsConfigMapTemplate="configmap-appsettings.yml"
# ConfigMap values
configMapName="openai"
configurationType="keyvault" # appsettings, keyvault
aspNetCoreEnvironment="Production"
# Ingress
httpIngressName="openai-http"
httpIngressTemplate="ingress-http.yml"
httpSecretName="openai-http-tls"
httpHostName="${httpSubdomain,,}.${dnsZoneName,,}"
httpServiceName="openai"
httpServicePort="80"
grpcIngressName="openai-grpc"
grpcIngressTemplate="ingress-grpc.yml"
grpcSecretName="openai-grpc-tls"
grpcHostName="${grpcSubdomain,,}.${dnsZoneName,,}"
grpcServiceName="openai"
grpcServicePort="6000"
Build Container Image
You can build the container image using the 01-build-docker-image.sh
in the scripts
folder.
#!/bin/bash
#Variables
source ./00-variables.sh
cd ../../openairestapi
docker build -t $containerImageName:$containerImageTag -f Dockerfile .
Upload Container Image to Azure Container Registry (ACR)
You can push the Docker container image to Azure Container Registry (ACR) using the 02-push-docker-image.sh
script in the scripts
folder.
#!/bin/bash
# Variables
source ./00-variables.sh
# Login to ACR
az acr login --name ${acrName,,}
# Retrieve ACR login server. Each container image needs to be tagged with the loginServer name of the registry.
loginServer=$(az acr show --name ${acrName,,} --query loginServer --output tsv)
# Tag the local container image with the loginServer of ACR
docker tag $containerImageName:$containerImageTag $loginServer/$containerImageName:$containerImageTag
# Push $containerImageName container image to ACR
docker push $loginServer/$containerImageName:$containerImageTag
Enable the OpenID Connect (OIDC) Endpoint on Your AKS cluster
You can run the 03-enable-oidc.sh
script to enable the OIDC issuer endpoint in your AKS cluster. For more information, see Create an OpenID Connect provider on Azure Kubernetes Service (AKS).
#!/bin/bash
# For more information, see https://docs.microsoft.com/en-us/azure/aks/cluster-configuration#oidc-issuer-preview
# Variables
source ./00-variables.sh
# Check if the OIDC discovery endpoint has been already enabled
echo "Check if the OIDC discovery endpoint has been already enabled on the [$aksClusterName] AKS cluster..."
enabled=$(az aks show \
--name $aksClusterName \
--resource-group $resourceGroupName \
--query oidcIssuerProfile.enabled \
--output tsv \
--only-show-errors)
if [[ $enabled == 'true' ]]; then
echo "The OIDC discovery endpoint has been already enabled on the [$aksClusterName] AKS cluster"
else
echo "The OIDC discovery endpoint has not been already enabled on the [$aksClusterName] AKS cluster"
echo "Enabling the OIDC discovery endpoint on the [$aksClusterName] AKS cluster"
az aks update \
--name $aksClusterName \
--resource-group $resourceGroupName \
--enable-oidc-issuer \
--only-show-errors
if [[ $? == 0 ]]; then
echo "The OIDC discovery endpoint has been successfully enabled on the [$aksClusterName] AKS cluster"
else
echo "Failed to enable the OIDC discovery endpoint on the [$aksClusterName] AKS cluster"
fi
fi
Enable Entra ID Workload Identity
You can run the 04-enable-workload-identity.sh
script to enable Microsoft Entra Workload ID in your AKS cluster. For more information, see Use Microsoft Entra Workload ID with Azure Kubernetes Service (AKS).
#!/bin/bash
# For more information, see https://docs.microsoft.com/en-us/azure/aks/cluster-configuration#oidc-issuer-preview
# Variables
source ./00-variables.sh
# Install aks-preview Azure extension
echo "Checking if [aks-preview] extension is already installed..."
az extension show --name aks-preview &>/dev/null
if [[ $? == 0 ]]; then
echo "[aks-preview] extension is already installed"
# Update the extension to make sure you have the latest version installed
echo "Updating [aks-preview] extension..."
az extension update --name aks-preview &>/dev/null
else
echo "[aks-preview] extension is not installed. Installing..."
# Install aks-preview extension
az extension add --name aks-preview 1>/dev/null
if [[ $? == 0 ]]; then
echo "[aks-preview] extension successfully installed"
else
echo "Failed to install [aks-preview] extension"
exit
fi
fi
# Registering AKS feature extensions
aksExtensions=("EnableWorkloadIdentityPreview")
registeringExtensions=()
for aksExtension in ${aksExtensions[@]}; do
echo "Checking if [$aksExtension] extension is already registered..."
extension=$(az feature list -o table --query "[?contains(name, 'Microsoft.ContainerService/$aksExtension') && @.properties.state == 'Registered'].{Name:name}" --output tsv)
if [[ -z $extension ]]; then
echo "[$aksExtension] extension is not registered."
echo "Registering [$aksExtension] extension..."
az feature register --name $aksExtension --namespace Microsoft.ContainerService
registeringExtensions+=("$aksExtension")
ok=1
else
echo "[$aksExtension] extension is already registered."
fi
done
delay=1
for aksExtension in ${registeringExtensions[@]}; do
echo -n "Checking if [$aksExtension] extension is already registered..."
while true; do
extension=$(az feature list -o table --query "[?contains(name, 'Microsoft.ContainerService/$aksExtension') && @.properties.state == 'Registered'].{Name:name}" --output tsv)
if [[ -z $extension ]]; then
echo -n "."
sleep $delay
else
echo "."
break
fi
done
done
# Check if extensions have been successfully registered
if [[ $ok == 1 ]]; then
echo "Refreshing the registration of the Microsoft.ContainerService resource provider..."
az provider register --namespace Microsoft.ContainerService
echo "Microsoft.ContainerService resource provider registration successfully refreshed"
fi
# Check if the workload identity has been already enabled
echo "Check if the workload identity has been already enabled on the [$aksClusterName] AKS cluster..."
enabled=$(az aks show \
--name $aksClusterName \
--resource-group $resourceGroupName \
--query oidcIssuerProfile.enabled \
--output tsv \
--only-show-errors)
if [[ $enabled == 'true' ]]; then
echo "The workload identity has been already enabled on the [$aksClusterName] AKS cluster"
else
echo "The workload identity has not been already enabled on the [$aksClusterName] AKS cluster"
echo "Enabling the workload identity on the [$aksClusterName] AKS cluster"
az aks update \
--name $aksClusterName \
--resource-group $resourceGroupName \
--enable-workload-identity \
--only-show-errors
if [[ $? == 0 ]]; then
echo "The workload identity has been successfully enabled on the [$aksClusterName] AKS cluster"
else
echo "Failed to enable the workload identity on the [$aksClusterName] AKS cluster"
fi
fi
Use Key Vault Secrets to Store Service Configuration
You can use the 06-create-key-vault-and-secrets.sh
script to store the service configuration in a series of secrets in your Azure Key Vault. If you plan to use this approach along with Microsoft Entra Workload ID, you need to make sure to create an access policy to grant List and Get permissions on secrets to the user-assigned managed identity used by the service on AKS.
#!/bin/bash
# Variables
source ./00-variables.sh
# Check if the resource group already exists
echo "Checking if [$resourceGroupName] resource group actually exists in the [$subscriptionName] subscription..."
az group show --name $resourceGroupName &> /dev/null
if [[ $? != 0 ]]; then
echo "No [$resourceGroupName] resource group actually exists in the [$subscriptionName] subscription"
echo "Creating [$resourceGroupName] resource group in the [$subscriptionName] subscription..."
# create the resource group
az group create --name $resourceGroupName --location $location 1> /dev/null
if [[ $? == 0 ]]; then
echo "[$resourceGroupName] resource group successfully created in the [$subscriptionName] subscription"
else
echo "Failed to create [$resourceGroupName] resource group in the [$subscriptionName] subscription"
exit
fi
else
echo "[$resourceGroupName] resource group already exists in the [$subscriptionName] subscription"
fi
# Check if the key vault already exists
echo "Checking if [$keyVaultName] key vault actually exists in the [$subscriptionName] subscription..."
az keyvault show --name $keyVaultName --resource-group $resourceGroupName &> /dev/null
if [[ $? != 0 ]]; then
echo "No [$keyVaultName] key vault actually exists in the [$subscriptionName] subscription"
echo "Creating [$keyVaultName] key vault in the [$subscriptionName] subscription..."
# create the key vault
az keyvault create \
--name $keyVaultName \
--resource-group $resourceGroupName \
--location $location \
--enabled-for-deployment \
--enabled-for-disk-encryption \
--enabled-for-template-deployment \
--sku $keyVaultSku 1> /dev/null
if [[ $? == 0 ]]; then
echo "[$keyVaultName] key vault successfully created in the [$subscriptionName] subscription"
else
echo "Failed to create [$keyVaultName] key vault in the [$subscriptionName] subscription"
exit
fi
else
echo "[$keyVaultName] key vault already exists in the [$subscriptionName] subscription"
fi
create_secret() {
local secretName="$1"
local secretValue="$2"
local keyVaultName="$3"
echo "Checking if [$secretName] secret actually exists in the [$keyVaultName] key vault..."
az keyvault secret show --name "$secretName" --vault-name "$keyVaultName" &> /dev/null
if [[ $? != 0 ]]; then
echo "No [$secretName] secret actually exists in the [$keyVaultName] key vault"
echo "Creating [$secretName] secret in the [$keyVaultName] key vault..."
# Create the secret
az keyvault secret set \
--name "$secretName" \
--vault-name "$keyVaultName" \
--value "$secretValue" 1> /dev/null
if [[ $? == 0 ]]; then
echo "[$secretName] secret successfully created in the [$keyVaultName] key vault"
else
echo "Failed to create [$secretName] secret in the [$keyVaultName] key vault"
exit 1
fi
else
echo "[$secretName] secret already exists in the [$keyVaultName] key vault"
fi
}
# Create secrets
create_secret "TenantAzureOpenAIMappings--fabrikam" "BetaOpenAI" "$keyVaultName"
create_secret "TenantAzureOpenAIMappings--contoso" "AlphaOpenAI" "$keyVaultName"
create_secret "Prometheus--Histograms--TotalTokens--Width" "100" "$keyVaultName"
create_secret "Prometheus--Histograms--TotalTokens--Start" "100" "$keyVaultName"
create_secret "Prometheus--Histograms--TotalTokens--Count" "10" "$keyVaultName"
create_secret "Prometheus--Histograms--PromptTokens--Width" "10" "$keyVaultName"
create_secret "Prometheus--Histograms--PromptTokens--Start" "10" "$keyVaultName"
create_secret "Prometheus--Histograms--PromptTokens--Count" "10" "$keyVaultName"
create_secret "Prometheus--Histograms--CompletionTokens--Width" "100" "$keyVaultName"
create_secret "Prometheus--Histograms--CompletionTokens--Start" "100" "$keyVaultName"
create_secret "Prometheus--Histograms--CompletionTokens--Count" "10" "$keyVaultName"
create_secret "Prometheus--Enabled" "True" "$keyVaultName"
create_secret "ChatCompletionsOptions--Temperature" "0.8" "$keyVaultName"
create_secret "ChatCompletionsOptions--MaxTokens" "16000" "$keyVaultName"
create_secret "AzureOpenAI--SystemPrompt" "The assistant is helpful, creative, clever, and very friendly." "$keyVaultName"
create_secret "AzureOpenAI--Services--BetaOpenAI--Version" "2023-08-01-preview" "$keyVaultName"
create_secret "AzureOpenAI--Services--BetaOpenAI--Type" "azuread" "$keyVaultName"
create_secret "AzureOpenAI--Services--BetaOpenAI--Model" "gpt-35-turbo-16k" "$keyVaultName"
create_secret "AzureOpenAI--Services--BetaOpenAI--MaxRetries" "3" "$keyVaultName"
create_secret "AzureOpenAI--Services--BetaOpenAI--MaxResponseTokens" "1000" "$keyVaultName"
create_secret "AzureOpenAI--Services--BetaOpenAI--Endpoint" "https://BetaOpenAI.openai.azure.com/" "$keyVaultName"
create_secret "AzureOpenAI--Services--BetaOpenAI--Deployment" "gpt-35-turbo-16k" "$keyVaultName"
create_secret "AzureOpenAI--Services--AlphaOpenAI--Version" "2023-08-01-preview" "$keyVaultName"
create_secret "AzureOpenAI--Services--AlphaOpenAI--Type" "azuread" "$keyVaultName"
create_secret "AzureOpenAI--Services--AlphaOpenAI--Model" "gpt-35-turbo-16k" "$keyVaultName"
create_secret "AzureOpenAI--Services--AlphaOpenAI--MaxRetries" "3" "$keyVaultName"
create_secret "AzureOpenAI--Services--AlphaOpenAI--MaxResponseTokens" "1000" "$keyVaultName"
create_secret "AzureOpenAI--Services--AlphaOpenAI--Endpoint" "https://AlphaOpenAI.openai.azure.com/" "$keyVaultName"
create_secret "AzureOpenAI--Services--AlphaOpenAI--Deployment" "gpt-35-turbo-16k" "$keyVaultName"
If you don't want to use Azure Key Vault, you can also pass the configuration to the service in a configmap. See below in this article for more information.
#!/bin/bash
# Variables
source ./00-variables.sh
# Check if the user-assigned managed identity already exists
echo "Checking if [$managedIdentityName] user-assigned managed identity actually exists in the [$aksResourceGroupName] resource group..."
az identity show \
--name $managedIdentityName \
--resource-group $aksResourceGroupName &>/dev/null
if [[ $? != 0 ]]; then
echo "No [$managedIdentityName] user-assigned managed identity actually exists in the [$aksResourceGroupName] resource group"
echo "Creating [$managedIdentityName] user-assigned managed identity in the [$aksResourceGroupName] resource group..."
# Create the user-assigned managed identity
az identity create \
--name $managedIdentityName \
--resource-group $aksResourceGroupName \
--location $location \
--subscription $subscriptionId 1>/dev/null
if [[ $? == 0 ]]; then
echo "[$managedIdentityName] user-assigned managed identity successfully created in the [$aksResourceGroupName] resource group"
else
echo "Failed to create [$managedIdentityName] user-assigned managed identity in the [$aksResourceGroupName] resource group"
exit
fi
else
echo "[$managedIdentityName] user-assigned managed identity already exists in the [$aksResourceGroupName] resource group"
fi
# Retrieve the clientId of the user-assigned managed identity
echo "Retrieving clientId for [$managedIdentityName] managed identity..."
clientId=$(az identity show \
--name $managedIdentityName \
--resource-group $aksResourceGroupName \
--query clientId \
--output tsv)
if [[ -n $clientId ]]; then
echo "[$clientId] clientId for the [$managedIdentityName] managed identity successfully retrieved"
else
echo "Failed to retrieve clientId for the [$managedIdentityName] managed identity"
exit
fi
# Retrieve the principalId of the user-assigned managed identity
echo "Retrieving principalId for [$managedIdentityName] managed identity..."
principalId=$(az identity show \
--name $managedIdentityName \
--resource-group $aksResourceGroupName \
--query principalId \
--output tsv)
if [[ -n $principalId ]]; then
echo "[$principalId] principalId for the [$managedIdentityName] managed identity successfully retrieved"
else
echo "Failed to retrieve principalId for the [$managedIdentityName] managed identity"
exit
fi
# Grant get and list permissions on key vault secrets to the managed identity
echo "Granting Get and List permissions on secrets in [$keyVaultName] key vault to [$managedIdentityName] managed identity..."
az keyvault set-policy \
--name $keyVaultName \
--object-id $principalId \
--secret-permissions get list 1>/dev/null
if [[ $? == 0 ]]; then
echo "Get and List permissions on secrets in [$keyVaultName] key vault successfully granted to [$managedIdentityName] managed identity"
else
echo "Failed to grant Get and List permissions on secrets in [$keyVaultName] key vault to [$managedIdentityName] managed identity"
exit
fi
if [[ $? == 0 ]]; then
echo "Access policy successfully set for the [$managedIdentityName] managed identity on the [$keyVaultName] key vault"
else
echo "Failed to set the access policy for the [$managedIdentityName] managed identity on the [$keyVaultName] key vault"
fi
for ((i = 0; i < ${#openAiNames[@]}; i++)); do
openAiName=${openAiNames[$i]}
openAiResourceGroupName=${openAiResourceGroupNames[$i]}
# Get the resource id of the Azure OpenAI resource
openAiId=$(az cognitiveservices account show \
--name $openAiName \
--resource-group $openAiResourceGroupName \
--query id \
--output tsv)
if [[ -n $openAiId ]]; then
echo "Resource id for the [$openAiName] Azure OpenAI resource successfully retrieved"
else
echo "Failed to retrieve the resource id for the [$openAiName] Azure OpenAI resource"
exit -1
fi
# Assign the Cognitive Services User role on the Azure OpenAI resource to the managed identity
role="Cognitive Services User"
echo "Checking if the [$managedIdentityName] managed identity has been assigned to [$role] role with [$openAiName] Azure OpenAI resource as a scope..."
current=$(az role assignment list \
--assignee $principalId \
--scope $openAiId \
--query "[?roleDefinitionName=='$role'].roleDefinitionName" \
--output tsv 2>/dev/null)
if [[ $current == $role ]]; then
echo "[$managedIdentityName] managed identity is already assigned to the ["$current"] role with [$openAiName] Azure OpenAI resource as a scope"
else
echo "[$managedIdentityName] managed identity is not assigned to the [$role] role with [$openAiName] Azure OpenAI resource as a scope"
echo "Assigning the [$role] role to the [$managedIdentityName] managed identity with [$openAiName] Azure OpenAI resource as a scope..."
az role assignment create \
--assignee $principalId \
--role "$role" \
--scope $openAiId 1>/dev/null
if [[ $? == 0 ]]; then
echo "[$managedIdentityName] managed identity successfully assigned to the [$role] role with [$openAiName] Azure OpenAI resource as a scope"
else
echo "Failed to assign the [$managedIdentityName] managed identity to the [$role] role with [$openAiName] Azure OpenAI resource as a scope"
exit
fi
fi
done
Create a User-Defined Managed Identity
You can use the 07-create-workload-managed-identity.sh
script to create the user-defined managed identity used by the service and assign it the Cognitive Services User
role on the Azure OpenAI Service(s).
#!/bin/bash
# Variables
source ./00-variables.sh
# Check if the user-assigned managed identity already exists
echo "Checking if [$managedIdentityName] user-assigned managed identity actually exists in the [$aksResourceGroupName] resource group..."
az identity show \
--name $managedIdentityName \
--resource-group $aksResourceGroupName &>/dev/null
if [[ $? != 0 ]]; then
echo "No [$managedIdentityName] user-assigned managed identity actually exists in the [$aksResourceGroupName] resource group"
echo "Creating [$managedIdentityName] user-assigned managed identity in the [$aksResourceGroupName] resource group..."
# Create the user-assigned managed identity
az identity create \
--name $managedIdentityName \
--resource-group $aksResourceGroupName \
--location $location \
--subscription $subscriptionId 1>/dev/null
if [[ $? == 0 ]]; then
echo "[$managedIdentityName] user-assigned managed identity successfully created in the [$aksResourceGroupName] resource group"
else
echo "Failed to create [$managedIdentityName] user-assigned managed identity in the [$aksResourceGroupName] resource group"
exit
fi
else
echo "[$managedIdentityName] user-assigned managed identity already exists in the [$aksResourceGroupName] resource group"
fi
# Retrieve the clientId of the user-assigned managed identity
echo "Retrieving clientId for [$managedIdentityName] managed identity..."
clientId=$(az identity show \
--name $managedIdentityName \
--resource-group $aksResourceGroupName \
--query clientId \
--output tsv)
if [[ -n $clientId ]]; then
echo "[$clientId] clientId for the [$managedIdentityName] managed identity successfully retrieved"
else
echo "Failed to retrieve clientId for the [$managedIdentityName] managed identity"
exit
fi
# Retrieve the principalId of the user-assigned managed identity
echo "Retrieving principalId for [$managedIdentityName] managed identity..."
principalId=$(az identity show \
--name $managedIdentityName \
--resource-group $aksResourceGroupName \
--query principalId \
--output tsv)
if [[ -n $principalId ]]; then
echo "[$principalId] principalId for the [$managedIdentityName] managed identity successfully retrieved"
else
echo "Failed to retrieve principalId for the [$managedIdentityName] managed identity"
exit
fi
# Grant get and list permissions on key vault secrets to the managed identity
echo "Granting Get and List permissions on secrets in [$keyVaultName] key vault to [$managedIdentityName] managed identity..."
az keyvault set-policy \
--name $keyVaultName \
--object-id $principalId \
--secret-permissions get list 1>/dev/null
if [[ $? == 0 ]]; then
echo "Get and List permissions on secrets in [$keyVaultName] key vault successfully granted to [$managedIdentityName] managed identity"
else
echo "Failed to grant Get and List permissions on secrets in [$keyVaultName] key vault to [$managedIdentityName] managed identity"
exit
fi
if [[ $? == 0 ]]; then
echo "Access policy successfully set for the [$managedIdentityName] managed identity on the [$keyVaultName] key vault"
else
echo "Failed to set the access policy for the [$managedIdentityName] managed identity on the [$keyVaultName] key vault"
fi
for ((i = 0; i < ${#openAiNames[@]}; i++)); do
openAiName=${openAiNames[$i]}
openAiResourceGroupName=${openAiResourceGroupNames[$i]}
# Get the resource id of the Azure OpenAI resource
openAiId=$(az cognitiveservices account show \
--name $openAiName \
--resource-group $openAiResourceGroupName \
--query id \
--output tsv)
if [[ -n $openAiId ]]; then
echo "Resource id for the [$openAiName] Azure OpenAI resource successfully retrieved"
else
echo "Failed to retrieve the resource id for the [$openAiName] Azure OpenAI resource"
exit -1
fi
# Assign the Cognitive Services User role on the Azure OpenAI resource to the managed identity
role="Cognitive Services User"
echo "Checking if the [$managedIdentityName] managed identity has been assigned to [$role] role with [$openAiName] Azure OpenAI resource as a scope..."
current=$(az role assignment list \
--assignee $principalId \
--scope $openAiId \
--query "[?roleDefinitionName=='$role'].roleDefinitionName" \
--output tsv 2>/dev/null)
if [[ $current == $role ]]; then
echo "[$managedIdentityName] managed identity is already assigned to the ["$current"] role with [$openAiName] Azure OpenAI resource as a scope"
else
echo "[$managedIdentityName] managed identity is not assigned to the [$role] role with [$openAiName] Azure OpenAI resource as a scope"
echo "Assigning the [$role] role to the [$managedIdentityName] managed identity with [$openAiName] Azure OpenAI resource as a scope..."
az role assignment create \
--assignee $principalId \
--role "$role" \
--scope $openAiId 1>/dev/null
if [[ $? == 0 ]]; then
echo "[$managedIdentityName] managed identity successfully assigned to the [$role] role with [$openAiName] Azure OpenAI resource as a scope"
else
echo "Failed to assign the [$managedIdentityName] managed identity to the [$role] role with [$openAiName] Azure OpenAI resource as a scope"
exit
fi
fi
done
Create and Federate Service Account
You can use 08-create-service-account.sh
script to create the namespace and service account for the application on AKS and federate the service account with the user-defined managed identity created at the previous step.
#!/bin/bash
# Variables for the user-assigned managed identity
source ./00-variables.sh
# Check if the namespace already exists
result=$(kubectl get namespace -o 'jsonpath={.items[?(@.metadata.name=="'$namespace'")].metadata.name'})
if [[ -n $result ]]; then
echo "[$namespace] namespace already exists"
else
# Create the namespace for your ingress resources
echo "[$namespace] namespace does not exist"
echo "Creating [$namespace] namespace..."
kubectl create namespace $namespace
fi
# Check if the service account already exists
result=$(kubectl get sa -n $namespace -o 'jsonpath={.items[?(@.metadata.name=="'$serviceAccountName'")].metadata.name'})
if [[ -n $result ]]; then
echo "[$serviceAccountName] service account already exists"
else
# Retrieve the resource id of the user-assigned managed identity
echo "Retrieving clientId for [$managedIdentityName] managed identity..."
managedIdentityClientId=$(az identity show \
--name $managedIdentityName \
--resource-group $resourceGroupName \
--query clientId \
--output tsv)
if [[ -n $managedIdentityClientId ]]; then
echo "[$managedIdentityClientId] clientId for the [$managedIdentityName] managed identity successfully retrieved"
else
echo "Failed to retrieve clientId for the [$managedIdentityName] managed identity"
exit
fi
# Create the service account
echo "[$serviceAccountName] service account does not exist"
echo "Creating [$serviceAccountName] service account..."
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: ServiceAccount
metadata:
annotations:
azure.workload.identity/client-id: $managedIdentityClientId
azure.workload.identity/tenant-id: $tenantId
labels:
azure.workload.identity/use: "true"
name: $serviceAccountName
namespace: $namespace
EOF
fi
# Show service account YAML manifest
echo "Service Account YAML manifest"
echo "-----------------------------"
kubectl get sa $serviceAccountName -n $namespace -o yaml
# Check if the federated identity credential already exists
echo "Checking if [$federatedIdentityName] federated identity credential actually exists in the [$resourceGroupName] resource group..."
az identity federated-credential show \
--name $federatedIdentityName \
--resource-group $resourceGroupName \
--identity-name $managedIdentityName &>/dev/null
if [[ $? != 0 ]]; then
echo "No [$federatedIdentityName] federated identity credential actually exists in the [$resourceGroupName] resource group"
# Get the OIDC Issuer URL
aksOidcIssuerUrl="$(az aks show \
--only-show-errors \
--name $aksClusterName \
--resource-group $resourceGroupName \
--query oidcIssuerProfile.issuerUrl \
--output tsv)"
# Show OIDC Issuer URL
if [[ -n $aksOidcIssuerUrl ]]; then
echo "The OIDC Issuer URL of the $aksClusterName cluster is $aksOidcIssuerUrl"
fi
echo "Creating [$federatedIdentityName] federated identity credential in the [$resourceGroupName] resource group..."
# Establish the federated identity credential between the managed identity, the service account issuer, and the subject.
az identity federated-credential create \
--name $federatedIdentityName \
--identity-name $managedIdentityName \
--resource-group $resourceGroupName \
--issuer $aksOidcIssuerUrl \
--subject system:serviceaccount:$namespace:$serviceAccountName
if [[ $? == 0 ]]; then
echo "[$federatedIdentityName] federated identity credential successfully created in the [$resourceGroupName] resource group"
else
echo "Failed to create [$federatedIdentityName] federated identity credential in the [$resourceGroupName] resource group"
exit
fi
else
echo "[$federatedIdentityName] federated identity credential already exists in the [$resourceGroupName] resource group"
fi
Deploy the NGINX Ingress Controller
To deploy the NGINX ingress controller using a Helm chart, you can utilize the 09-create-nginx-ingress-controller.sh
script. This script automates the process of deploying the NGINX ingress controller, which is a popular solution for managing inbound traffic to your Kubernetes cluster. In this sample, the NGINX Ingress Controller is used as a service proxy to expose our service via REST and gRPC.
#!/bin/bash
# Variables
source ./00-variables.sh
# Use Helm to deploy an NGINX ingress controller
result=$(helm list -n $nginxNamespace | grep $nginxReleaseName | awk '{print $1}')
if [[ -n $result ]]; then
echo "[$nginxReleaseName] ingress controller already exists in the [$nginxNamespace] namespace"
else
# Check if the ingress-nginx repository is not already added
result=$(helm repo list | grep $nginxRepoName | awk '{print $1}')
if [[ -n $result ]]; then
echo "[$nginxRepoName] Helm repo already exists"
else
# Add the ingress-nginx repository
echo "Adding [$nginxRepoName] Helm repo..."
helm repo add $nginxRepoName $nginxRepoUrl
fi
# Update your local Helm chart repository cache
echo 'Updating Helm repos...'
helm repo update
# Deploy NGINX ingress controller
echo "Deploying [$nginxReleaseName] NGINX ingress controller to the [$nginxNamespace] namespace..."
helm install $nginxReleaseName $nginxRepoName/$nginxChartName \
--create-namespace \
--namespace $nginxNamespace \
--set controller.metrics.enabled=true \
--set controller.metrics.serviceMonitor.enabled=true \
--set controller.metrics.serviceMonitor.additionalLabels.release="prometheus" \
--set controller.nodeSelector."kubernetes\.io/os"=linux \
--set controller.replicaCount=$replicaCount \
--set defaultBackend.nodeSelector."kubernetes\.io/os"=linux \
--set controller.service.annotations."service\.beta\.kubernetes\.io/azure-load-balancer-health-probe-request-path"=/healthz
fi
Create DNS Records
If you utilize Azure DNS Zone to manage your public DNS hostname, the 11-configure-dns.sh
script can be employed to create an A record within the Azure DNS Zone. This A record is responsible for exposing the application via a specified subdomain. Azure DNS Zone is a hosting service for DNS domains within Azure. It enables you to manage DNS records for your domain names and provides high availability and fast DNS responses. By leveraging Azure DNS Zone, you can conveniently manage your DNS records directly within the Azure portal. The 11-configure-dns.sh
script automates the process of creating an A record in the Azure DNS Zone for a given subdomain. By running this script and providing the necessary configuration, you can easily configure the DNS settings to expose your application via the specified subdomain. By utilizing Azure DNS Zone along with the 11-configure-dns.sh
script, you can streamline the process of managing DNS records and ensure that your application is accessible via the desired subdomain. This integration simplifies the DNS configuration process, allowing you to focus on other aspects of your application deployment and management.
# Variables
source ./00-variables.sh
subdomains=($httpSubdomain $grpcSubdomain)
# Install jq if not installed
path=$(which jq)
if [[ -z $path ]]; then
echo 'Installing jq...'
apt install -y jq
fi
# Retrieve the public IP address of the NGINX ingress controller
echo "Retrieving the external IP address of the [$nginxReleaseName] NGINX ingress controller..."
publicIpAddress=$(kubectl get service -o json -n $nginxNamespace |
jq -r '.items[] |
select(.spec.type == "LoadBalancer" and .metadata.name == "'$nginxReleaseName'-ingress-nginx-controller") |
.status.loadBalancer.ingress[0].ip')
if [ -n $publicIpAddress ]; then
echo "[$publicIpAddress] external IP address of the [$nginxReleaseName] NGINX ingress controller successfully retrieved"
else
echo "Failed to retrieve the external IP address of the [$nginxReleaseName] NGINX ingress controller"
exit
fi
for subdomain in ${subdomains[@]}; do
# Check if an A record for todolist subdomain exists in the DNS Zone
echo "Retrieving the A record for the [$subdomain] subdomain from the [$dnsZoneName] DNS zone..."
ipv4Address=$(az network dns record-set a list \
--zone-name $dnsZoneName \
--resource-group $dnsZoneResourceGroupName \
--query "[?name=='$subdomain'].ARecords[].ipv4Address" \
--output tsv \
--only-show-errors)
if [[ -n $ipv4Address ]]; then
echo "An A record already exists in [$dnsZoneName] DNS zone for the [$subdomain] subdomain with [$ipv4Address] IP address"
if [[ $ipv4Address == $publicIpAddress ]]; then
echo "The [$ipv4Address] ip address of the existing A record is equal to the ip address of the ingress"
echo "No additional step is required"
continue
else
echo "The [$ipv4Address] ip address of the existing A record is different than the ip address of the ingress"
fi
# Retrieving name of the record set relative to the zone
echo "Retrieving the name of the record set relative to the [$dnsZoneName] zone..."
recordSetName=$(az network dns record-set a list \
--zone-name $dnsZoneName \
--resource-group $dnsZoneResourceGroupName \
--query "[?name=='$subdomain'].name" \
--output tsv \
--only-show-errors 2>/dev/null)
if [[ -n $recordSetName ]]; then
echo "[$recordSetName] record set name successfully retrieved"
else
echo "Failed to retrieve the name of the record set relative to the [$dnsZoneName] zone"
exit
fi
# Remove the A record
echo "Removing the A record from the record set relative to the [$dnsZoneName] zone..."
az network dns record-set a remove-record \
--ipv4-address $ipv4Address \
--record-set-name $recordSetName \
--zone-name $dnsZoneName \
--resource-group $dnsZoneResourceGroupName \
--only-show-errors 1>/dev/null
if [[ $? == 0 ]]; then
echo "[$ipv4Address] ip address successfully removed from the [$recordSetName] record set"
else
echo "Failed to remove the [$ipv4Address] ip address from the [$recordSetName] record set"
exit
fi
fi
# Create the A record
echo "Creating an A record in [$dnsZoneName] DNS zone for the [$subdomain] subdomain with [$publicIpAddress] IP address..."
az network dns record-set a add-record \
--zone-name $dnsZoneName \
--resource-group $dnsZoneResourceGroupName \
--record-set-name $subdomain \
--ipv4-address $publicIpAddress \
--only-show-errors 1>/dev/null
if [[ $? == 0 ]]; then
echo "A record for the [$subdomain] subdomain with [$publicIpAddress] IP address successfully created in [$dnsZoneName] DNS zone"
else
echo "Failed to create an A record for the $subdomain subdomain with [$publicIpAddress] IP address in [$dnsZoneName] DNS zone"
fi
done
Install the Certificate Manager
To facilitate the secure exposure of service endpoints, you can utilize the 11-install-cert-manager.sh
script. This script is responsible for installing the Certificate Manager using a Helm chart, a tool used to manage certificates, and specifically, for requesting and issuing a Let's Encrypt certificate. The Certificate Manager, also known as `cert-manager``, is an open-source project that integrates with Kubernetes to automate the management and provisioning of SSL/TLS certificates. By installing cert-manager, you can easily request, renew, and rotate certificates for your service endpoints. Let's Encrypt is a free, automated, and open certificate authority that provides SSL/TLS certificates. It allows you to secure your service endpoints with trusted certificates without any cost or manual intervention.
#!/bin/bash
# Variables
source ./00-variables.sh
# Install cert-manager Helm chart
result=$(helm list -n $certManagerNamespace | grep $certManagerReleaseName | awk '{print $1}')
if [[ -n $result ]]; then
echo "[$certManagerReleaseName] cert-manager already exists in the $certManagerNamespace namespace"
else
# Check if the jetstack repository is not already added
result=$(helm repo list | grep $certManagerRepoName | awk '{print $1}')
if [[ -n $result ]]; then
echo "[$certManagerRepoName] Helm repo already exists"
else
# Add the jetstack Helm repository
echo "Adding [$certManagerRepoName] Helm repo..."
helm repo add $certManagerRepoName $certManagerRepoUrl
fi
# Update your local Helm chart repository cache
echo 'Updating Helm repos...'
helm repo update
# Install the cert-manager Helm chart
echo "Deploying [$certManagerReleaseName] cert-manager to the $certManagerNamespace namespace..."
helm install $certManagerReleaseName $certManagerRepoName/$certManagerChartName \
--create-namespace \
--namespace $certManagerNamespace \
--set installCRDs=true \
--set nodeSelector."kubernetes\.io/os"=linux
fi
# Check if the cluster issuer already exists
result=$(kubectl get ClusterIssuer -o json | jq -r '.items[].metadata.name | select(. == "'$clusterIssuer'")')
if [[ -n $result ]]; then
echo "[$clusterIssuer] cluster issuer already exists"
exit
else
# Create the cluster issuer
echo "[$clusterIssuer] cluster issuer does not exist"
echo "Creating [$clusterIssuer] cluster issuer..."
cat $template | yq "(.spec.acme.email)|="\""$email"\" | kubectl apply -f -
fi
Create Test Service Principal
You can use the 14-create-test-service-principal.sh
script to create a service principal in Microsoft Entra ID tenant to use when debugging the service locally instead of using the API key of your Azure OpenAI instances. The script assign the Cognitive Services User
role to the service principal and creates an access policy to let the service principal to list and get secrets from the Azure Key Vault resource storing the workload configuraton.
#!/bin/bash
# Variables
source ./00-variables.sh
#!/bin/bash
# Check if service principal exists
echo "Checking if the service principal [$servicePrincipalName] already exists..."
appId=$(az ad sp list \
--display-name $servicePrincipalName \
--query [].appId \
--output tsv)
if [[ -n $appId ]]; then
echo "Service principal [$servicePrincipalName] already exists."
else
# Create service principal
az ad sp create-for-rbac \
--name "$servicePrincipalName" \
--role reader \
--years 5 \
--scopes /subscriptions/$subscriptionId
if [[ $? -eq 0 ]]; then
echo "Service principal [$servicePrincipalName] successfully created."
else
echo "Failed to create service principal [$servicePrincipalName]."
exit 1
fi
# Retrieve service principal appId
echo "Retrieving appId for [$servicePrincipalName] service principal..."
appId=$(az ad sp list \
--display-name $servicePrincipalName \
--query [].appId \
--output tsv)
if [[ -n $appId ]]; then
echo "[$appId] appId for the [$servicePrincipalName] service principal successfully retrieved"
else
echo "Failed to retrieve appId for the [$servicePrincipalName] service principal"
exit 1
fi
fi
# Grant get and list permissions on key vault secrets to the service principal
echo "Granting Get and List permissions on secrets in [$keyVaultName] key vault to [$servicePrincipalName] service principal..."
az keyvault set-policy \
--name $keyVaultName \
--spn $appId \
--secret-permissions get list 1>/dev/null
if [[ $? == 0 ]]; then
echo "Get and List permissions on secrets in [$keyVaultName] key vault successfully granted to [$servicePrincipalName] service principal"
else
echo "Failed to grant Get and List permissions on secrets in [$keyVaultName] key vault to [$servicePrincipalName] service principal"
exit
fi
if [[ $? == 0 ]]; then
echo "Access policy successfully set for the [$servicePrincipalName] service principal on the [$keyVaultName] key vault"
else
echo "Failed to set the access policy for the [$servicePrincipalName] service principal on the [$keyVaultName] key vault"
fi
for ((i = 0; i < ${#openAiNames[@]}; i++)); do
openAiName=${openAiNames[$i]}
openAiResourceGroupName=${openAiResourceGroupNames[$i]}
# Get the resource id of the Azure OpenAI resource
openAiId=$(az cognitiveservices account show \
--name $openAiName \
--resource-group $openAiResourceGroupName \
--query id \
--output tsv)
if [[ -n $openAiId ]]; then
echo "Resource id for the [$openAiName] Azure OpenAI resource successfully retrieved"
else
echo "Failed to the resource id for the [$openAiName] Azure OpenAI resource"
exit -1
fi
# Assign the Cognitive Services User role on the Azure OpenAI resource to the service principal
role="Cognitive Services User"
echo "Checking if the [$servicePrincipalName] service principal has been assigned to [$role] role with [$openAiName] Azure OpenAI resource as a scope..."
current=$(az role assignment list \
--assignee $appId \
--scope $openAiId \
--query "[?roleDefinitionName=='$role'].roleDefinitionName" \
--output tsv 2>/dev/null)
if [[ $current == $role ]]; then
echo "[$servicePrincipalName] service principal is already assigned to the ["$current"] role with [$openAiName] Azure OpenAI resource as a scope"
else
echo "[$servicePrincipalName] service principal is not assigned to the [$role] role with [$openAiName] Azure OpenAI resource as a scope"
echo "Assigning the [$role] role to the [$servicePrincipalName] service principal with [$openAiName] Azure OpenAI resource as a scope..."
az role assignment create \
--assignee $appId \
--role "$role" \
--scope $openAiId 1>/dev/null
if [[ $? == 0 ]]; then
echo "[$servicePrincipalName] service principal successfully assigned to the [$role] role with [$openAiName] Azure OpenAI resource as a scope"
else
echo "Failed to assign the [$servicePrincipalName] service principal to the [$role] role with [$openAiName] Azure OpenAI resource as a scope"
exit
fi
fi
done
Deploy the Service to AKS
You can use the 13-deploy-workload-using-kubectl.sh
script to deploy the application to your AKS cluster. Please note that you can pass the configuration to the workload via secrets in an Azure Key Vault resource or via a configmap. Depending on your choice, the script will use a different set of YAML manifests.
#!/bin/bash
# Variables
source ./00-variables.sh
# Attach ACR to AKS cluster
if [[ $attachAcr == true ]]; then
echo "Attaching ACR $acrName to AKS cluster $aksClusterName..."
az aks update \
--name $aksClusterName \
--resource-group $aksResourceGroupName \
--attach-acr $acrName
fi
# Create the namespace if it doesn't already exists in the cluster
result=$(kubectl get namespace -o jsonpath="{.items[?(@.metadata.name=='$namespace')].metadata.name}")
if [[ -n $result ]]; then
echo "[$namespace] namespace already exists in the cluster"
else
echo "[$namespace] namespace does not exist in the cluster"
echo "creating [$namespace] namespace in the cluster..."
kubectl create namespace $namespace
fi
if [[ $configurationType == "keyvault" ]]; then
echo "Selected configuration type is keyvault"
# Create configmap
cat $keyVaultConfigMapTemplate |
yq "(.metadata.name)|="\""$configMapName"\" |
yq "(.data.aspNetCoreEnvironment)|="\""$aspNetCoreEnvironment"\" |
yq "(.data.keyVaultName)|="\""$keyVaultName"\" |
kubectl apply -n $namespace -f -
# Create deployment
cat $keyVaultDeploymentTemplate |
yq "(.spec.template.spec.containers[0].image)|="\""$image"\" |
yq "(.spec.template.spec.containers[0].imagePullPolicy)|="\""$imagePullPolicy"\" |
yq "(.spec.template.spec.serviceAccountName)|="\""$serviceAccountName"\" |
kubectl apply -n $namespace -f -
else
echo "Selected configuration type is appsettings"
# Create configmap
cat $appSettingsConfigMapTemplate |
yq "(.metadata.name)|="\""$configMapName"\" |
yq "(.data.aspNetCoreEnvironment)|="\""$aspNetCoreEnvironment"\" |
kubectl apply -n $namespace -f -
# Create deployment
cat $appSettingsDeploymentTemplate |
yq "(.spec.template.spec.containers[0].image)|="\""$image"\" |
yq "(.spec.template.spec.containers[0].imagePullPolicy)|="\""$imagePullPolicy"\" |
yq "(.spec.template.spec.serviceAccountName)|="\""$serviceAccountName"\" |
yq "(.spec.template.spec.volumes[0].configMap.name)|="\""$configMapName"\" |
kubectl apply -n $namespace -f -
fi
# Create service
kubectl apply -f $serviceTemplate -n $namespace
# Create HTTP ingress
cat $httpIngressTemplate |
yq "(.metadata.name)|="\""$httpIngressName"\" |
yq "(.spec.tls[0].hosts[0])|="\""$httpHostName"\" |
yq "(.spec.tls[0].secretName)|="\""$httpSecretName"\" |
yq "(.spec.rules[0].host)|="\""$httpHostName"\" |
yq "(.spec.rules[0].http.paths[0].backend.service.name)|="\""$httpServiceName"\" |
yq "(.spec.rules[0].http.paths[0].backend.service.port.number)|=$httpServicePort" |
kubectl apply -n $namespace -f -
# Create gRPC ingress
cat $grpcIngressTemplate |
yq "(.metadata.name)|="\""$grpcIngressName"\" |
yq "(.spec.tls[0].hosts[0])|="\""$grpcHostName"\" |
yq "(.spec.tls[0].secretName)|="\""$grpcSecretName"\" |
yq "(.spec.rules[0].host)|="\""$grpcHostName"\" |
yq "(.spec.rules[0].http.paths[0].backend.service.name)|="\""$grpcServiceName"\" |
yq "(.spec.rules[0].http.paths[0].backend.service.port.number)|=$grpcServicePort" |
kubectl apply -n $namespace -f -
The script uses YAML manifests to create the following Kubernetes objects:
- A deployment and related pods hosting the service.
- A configmap containing the service configuration.
- A service used to expose the workload functionality.
- Two ingress objects to expose the service, respectively, via REST and gRPC.
The scripts used to deploy the YAML template use the yq tool to customize the manifests with the value of the variables defined in the 00-variables.sh
file. This tool is a lightweight and portable command-line YAML, JSON and XML processor that uses jq like syntax but works with YAML files as well as json, xml, properties, csv and tsv. It doesn't yet support everything jq does - but it does support the most common operations and functions, and more is being added continuously.
In particular, the ingress-grpc.yml
YAML manifest contains the definition of the ingress object used to expose the service via gRPC. For more information about exposing a gRPC service via the NGINX Ingress Controller, see gRCP under the NGINX Ingress Controller documentation.
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: openai-grpc
annotations:
nginx.ingress.kubernetes.io/ssl-redirect: "true"
nginx.ingress.kubernetes.io/backend-protocol: "GRPC"
cert-manager.io/cluster-issuer: letsencrypt-nginx
spec:
ingressClassName: nginx
tls:
- hosts:
- openaigrpc.babosbird.com
secretName: grpc-tls-secret
rules:
- host: openaigrpc.babosbird.com
http:
paths:
- path: /
pathType: ImplementationSpecific
backend:
service:
name: openai-grpc
port:
number: 6000
The following YAML manifest is used to deploy the workload.
apiVersion: apps/v1
kind: Deployment
metadata:
name: openai
labels:
app: openai
spec:
replicas: 3
selector:
matchLabels:
app: openai
azure.workload.identity/use: "true"
strategy:
rollingUpdate:
maxSurge: 1
maxUnavailable: 1
minReadySeconds: 5
template:
metadata:
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "80"
labels:
app: openai
azure.workload.identity/use: "true"
spec:
serviceAccountName: openai-sa
topologySpreadConstraints:
- maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: DoNotSchedule
labelSelector:
matchLabels:
app: openai
- maxSkew: 1
topologyKey: kubernetes.io/hostname
whenUnsatisfiable: DoNotSchedule
labelSelector:
matchLabels:
app: openai
nodeSelector:
"kubernetes.io/os": linux
containers:
- name: openai
image: paolosalvatori.azurecr.io/openai:v1
imagePullPolicy: Always
resources:
requests:
memory: "128Mi"
cpu: "250m"
limits:
memory: "256Mi"
cpu: "500m"
ports:
- containerPort: 80
name: http
- containerPort: 6000
name: grpc
livenessProbe:
httpGet:
path: /openai/echo?tenant=contoso&value=100
port: 80
failureThreshold: 1
initialDelaySeconds: 60
periodSeconds: 30
timeoutSeconds: 5
readinessProbe:
httpGet:
path: /openai/echo?tenant=contoso&value=100
port: 80
failureThreshold: 1
initialDelaySeconds: 60
periodSeconds: 30
timeoutSeconds: 5
startupProbe:
httpGet:
path: /openai/echo?tenant=contoso&value=100
port: 80
failureThreshold: 1
initialDelaySeconds: 60
periodSeconds: 30
timeoutSeconds: 5
env:
- name: ASPNETCORE_ENVIRONMENT
valueFrom:
configMapKeyRef:
name: openai
key: aspNetCoreEnvironment
- name: KeyVaultName
valueFrom:
configMapKeyRef:
name: openai
key: keyVaultName
Here are some notable observations about the manifest:
- The
prometheus.io/scrape
annotation set totrue
and theprometheus.io/port
annotation set to80
indicate that Prometheus should scrape metrics from the/metrics
path of the HTTP endpoint exposed by pods on port80
. This allows for seamless monitoring of the workload's metrics by Prometheus. - The
azure.workload.identity/use
label set totrue
triggers the azure-workload-identity mutating admission webhook or the AKS cluster to inject Azure-specific environment variables and the projected service account token volume. This simplifies the management and usage of Azure-specific functionalities within the workload. - The
openai-sa
service account is federated with the user-defined managed identity used by the workload. This federation enables the workload to access Azure OpenAI instances securely without the need for an explicit OpenAI key, leveraging the capabilities provided by Azure managed identities.
Prometheus and Grafana
To monitor the metrics generated by the service to measure the prompt and completion tokens consumed by individual tenants, you have the following deployment options for Prometheus and Grafana.
Deploy Prometheus and Grafana to AKS Cluster
You can deploy Prometheus and Grafana directly to your Azure Kubernetes Service (AKS) cluster. This can be achieved using the kube-prometheus-stack
Helm chart. The Helm chart provides an easy and efficient way to install and configure both Prometheus and Grafana within your AKS cluster. For more information on the kube-prometheus-stack
Helm chart, you can refer to the artifacthub.io page. You can use the 16-install-kube-prometheus-stack.sh
script to deploy the kube-prometheus-stack
via Helm chart.
!/bin/bash
# Variables
namespace="prometheus"
release="prometheus"
# Upgrade Helm chart
helm upgrade $release prometheus-community/kube-prometheus-stack \
--namespace $namespace \
--set prometheus.prometheusSpec.podMonitorSelectorNilUsesHelmValues=false \
--set prometheus.prometheusSpec.serviceMonitorSelectorNilUsesHelmValues=false \
--values kube-prometheus-stack-custom-values.yml
# Get values
helm get values $release --namespace $namespace
The kube-prometheus-stack-custom-values.yml
contains the definition of additional scrape jobs. In particular, the kubernetes-pods
job scrapes meytrics from pods with the prometheus.io/scrape
set to true
.
# prom-custom-values.yaml
prometheus:
prometheusSpec:
additionalScrapeConfigs:
- job_name: 'kubernetes-services'
kubernetes_sd_configs:
- role: service
relabel_configs:
# annotation 'prometheus.io/scrape' must be set to 'true'
- action: keep
regex: true
source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
# service cannot be in kube-system or prom namespaces
- action: drop
regex: (kube-system|prom)
source_labels: [__meta_kubernetes_namespace]
# service port name must end with word 'metrics'
- action: keep
regex: .*metrics
source_labels: [__meta_kubernetes_service_port_name]
# allow override of http scheme
- action: replace
regex: (https?)
source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
target_label: __scheme__
# allow override of default /metrics path
- action: replace
regex: (.+)
source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
target_label: __metrics_path__
# allow override of default port
- action: replace
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
target_label: __address__
- {action: labelmap, regex: __meta_kubernetes_service_label_(.+)}
- action: replace
source_labels: [__meta_kubernetes_namespace]
target_label: kubernetes_namespace
- action: replace
source_labels: [__meta_kubernetes_service_name]
target_label: kubernetes_name
- job_name: 'kubernetes-pods'
kubernetes_sd_configs:
- role: pod
relabel_configs:
# Scrape only pods with the annotation: prometheus.io/scrape = true
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
action: keep
regex: true
# If prometheus.io/path is specified, scrape this path instead of /metrics
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
# If prometheus.io/port is specified, scrape this port instead of the default
- source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
action: replace
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
target_label: __address__
# If prometheus.io/scheme is specified, scrape with this scheme instead of http
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scheme]
action: replace
regex: (http|https)
target_label: __scheme__
# Include the pod namespace as a label for each metric
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
# Include the pod name as a label for each metric
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: kubernetes_pod_name
# [Optional] Include all pod labels as labels for each metric
- action: labelmap
regex: __meta_kubernetes_pod_label_(.+)
Use Azure Monitor and Managed Grafana
Alternatively, you can configure your service to collect metrics and send them to the Azure Monitor managed service for Prometheus. Azure Monitor supports Managed Prometheus, which can be used to gather metrics from your service. You can then leverage Azure Managed Grafana to visualize these metrics in a user-friendly and customizable dashboard. This option provides the advantage of utilizing Azure's managed services for monitoring and visualization.
Azure Monitor managed service for Prometheus is a fully managed, highly scalable, and reliable monitoring service available in Azure. It offers a turnkey solution for collecting, querying, and alerting on metrics from AKS clusters. With Azure Managed Prometheus, you no longer need to deploy and manage Prometheus and Grafana within your clusters using a Helm chart. Instead, you can focus on extracting meaningful insights from the collected metrics. You can use a single Azure Monitor workspace to collect Prometheus metrics from a group of AKS clusters and use a single Azure Managed Grafana as a single pan of glass to visualize and aggregate Prometheus metrics collected in the Azure Monitor workspace from one or multiple AKS clusters.
Azure Managed Grafana is a managed service that provides a comprehensive data visualization platform built on top of the Grafana software by Grafana Labs. It's built as a fully managed Azure service operated and supported by Microsoft. Grafana helps you bring together metrics, logs and traces into a single user interface. With its extensive support for data sources and graphing capabilities, you can view and analyze your application and infrastructure telemetry data in real-time.
Azure Managed Grafana is optimized for the Azure environment. It works seamlessly with many Azure services and provides the following integration features:
- Built-in support for Azure Managed Prometheus and Azure Data Explorer.
- User authentication and access control using Azure Active Directory identities.
- Direct import of existing charts from the Azure portal.
In particular, by integrating with Azure Monitor managed service for Prometheus, Azure Managed Grafana allows you to create rich and customizable dashboards to visualize the Prometheus metrics collected in an Azure Monitor workspace from one or more AKS clusters. Azure Managed Grafana enables you to gain deep visibility into your AKS clusters, troubleshoot issues, and make informed decisions based on real-time data. You can also set up Azure Monitor alerts and use them with Azure Managed Grafana.
If you choose this option, you need to run the 15-deploy-ama-configmap.sh
script to configure the Azure Monitor agent running on AKS to scrape metrics from pods with the prometheus.io/scrape
annotation set to true
.
#!/bin/bash
# Deploy ConfigMap
kubectl apply -n kube-system -f ama-metrics-prometheus-config.yml
The ama-metrics-prometheus-config.yml
contains the definition of additional scrape jobs. In particular, the kubernetes-pods
job scrapes meytrics from pods with the prometheus.io/scrape
set to true
.
kind: ConfigMap
apiVersion: v1
data:
prometheus-config: |-
global:
scrape_interval: 10s
scrape_configs:
- job_name: serviceMonitor/ingress-basic/nginx-ingress-ingress-nginx-controller/0
honor_timestamps: true
scrape_interval: 30s
scrape_timeout: 10s
metrics_path: /metrics
scheme: http
follow_redirects: true
enable_http2: true
relabel_configs:
- source_labels: [job]
separator: ;
regex: (.*)
target_label: __tmp_prometheus_job_name
replacement: $1
action: replace
- source_labels:
[
__meta_kubernetes_service_label_app_kubernetes_io_component,
__meta_kubernetes_service_labelpresent_app_kubernetes_io_component,
]
separator: ;
regex: (controller);true
replacement: $1
action: keep
- source_labels:
[
__meta_kubernetes_service_label_app_kubernetes_io_instance,
__meta_kubernetes_service_labelpresent_app_kubernetes_io_instance,
]
separator: ;
regex: (nginx-ingress);true
replacement: $1
action: keep
- source_labels:
[
__meta_kubernetes_service_label_app_kubernetes_io_name,
__meta_kubernetes_service_labelpresent_app_kubernetes_io_name,
]
separator: ;
regex: (ingress-nginx);true
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_endpoint_port_name]
separator: ;
regex: metrics
replacement: $1
action: keep
- source_labels:
[
__meta_kubernetes_endpoint_address_target_kind,
__meta_kubernetes_endpoint_address_target_name,
]
separator: ;
regex: Node;(.*)
target_label: node
replacement: ${1}
action: replace
- source_labels:
[
__meta_kubernetes_endpoint_address_target_kind,
__meta_kubernetes_endpoint_address_target_name,
]
separator: ;
regex: Pod;(.*)
target_label: pod
replacement: ${1}
action: replace
- source_labels: [__meta_kubernetes_namespace]
separator: ;
regex: (.*)
target_label: namespace
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_service_name]
separator: ;
regex: (.*)
target_label: service
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_pod_name]
separator: ;
regex: (.*)
target_label: pod
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_pod_container_name]
separator: ;
regex: (.*)
target_label: container
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_pod_phase]
separator: ;
regex: (Failed|Succeeded)
replacement: $1
action: drop
- source_labels: [__meta_kubernetes_service_name]
separator: ;
regex: (.*)
target_label: job
replacement: ${1}
action: replace
- separator: ;
regex: (.*)
target_label: endpoint
replacement: metrics
action: replace
- source_labels: [__address__]
separator: ;
regex: (.*)
modulus: 1
target_label: __tmp_hash
replacement: $1
action: hashmod
- source_labels: [__tmp_hash]
separator: ;
regex: "0"
replacement: $1
action: keep
kubernetes_sd_configs:
- role: endpoints
kubeconfig_file: ""
follow_redirects: true
enable_http2: true
namespaces:
own_namespace: false
names:
- ingress-basic
- job_name: 'kubernetes-pods'
kubernetes_sd_configs:
- role: pod
relabel_configs:
# Scrape only pods with the annotation: prometheus.io/scrape = true
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
action: keep
regex: true
# If prometheus.io/path is specified, scrape this path instead of /metrics
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
# If prometheus.io/port is specified, scrape this port instead of the default
- source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
action: replace
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
target_label: __address__
# If prometheus.io/scheme is specified, scrape with this scheme instead of http
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scheme]
action: replace
regex: (http|https)
target_label: __scheme__
# Include the pod namespace as a label for each metric
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
# Include the pod name as a label for each metric
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: kubernetes_pod_name
# [Optional] Include all pod labels as labels for each metric
- action: labelmap
regex: __meta_kubernetes_pod_label_(.+)
metadata:
name: ama-metrics-prometheus-config
namespace: kube-system
Grafana Dashboard
Within the provided sample, you can find a dashboard, shown in the following picture, for both deployment options in the scripts/aks
folder.
- azure-managed-grafana-openai-dashboard.json.json: Use this dashboard if you choose to leverage Azure Monitor managed service for Prometheus and Azure Managed Grafana to collect and visualize Prometheus metrics.
- kube-prometheus-stack-openai-grafana-dashboard.json: Use this dashboard if you deploy Prometheus and Grafana to your AKS cluster using kube-prometheus-stack Helm chart.
These dashboards ensure that you have a predefined configuration for monitoring the metrics related to the per-tenant prompt and completion token consumption. You can choose the appropriate dashboard based on your selected deployment method and use it to visualize and analyze the metrics generated by your service.
Both deployment options offer flexibility and scalability in monitoring your service's metrics. Whether you prefer deploying Prometheus and Grafana directly to your AKS cluster or utilizing Azure Monitor with Managed Prometheus and Grafana, you can leverage the provided sample dashboards to get started quickly and efficiently.
Conclusion
Distributing requests across multiple instances of Azure OpenAI in a multitenant AI application is crucial for several reasons. Firstly, it ensures better performance and scalability by balancing the workload across multiple instances, preventing one tenant from monopolizing resources. Additionally, it enhances fault tolerance and availability, as multiple instances can handle requests even if one instance fails.
Measuring the prompt and consumption tokens consumed by individual tenants is equally important for proper cost allocation. By tracking the tokens consumed by each tenant, it becomes possible to accurately assign costs and perform chargebacks based on their usage. This allows for fair billing and cost management within the multitenant environment.
While this solution provides an example of creating and measuring per-tenant metrics, it's important to note that it does not implement any mechanism to throttle calls based on per-tenant token consumption. Throttling mechanisms can help prevent excessive token consumption by individual tenants, ensuring fair usage and preventing resource exhaustion. Implementing such mechanisms should be considered as an additional step to further optimize resource allocation and maintain application stability.
Overall, distributing requests and measuring token consumption on a per-tenant basis are integral aspects of building a successful multitenant AI application. These practices not only enable efficient resource utilization but also promote fair cost allocation and help maintain a stable and reliable application environment.