Azure Inference client library for .NET - version 1.0.0-beta.1
The client Library (in preview) does inference, including chat completions, for AI models deployed by Azure AI Studio and Azure Machine Learning Studio. It supports Serverless API endpoints and Managed Compute endpoints (formerly known as Managed Online Endpoints). The client library makes services calls using REST API version 2024-05-01-preview
, as documented in Azure AI Model Inference API. For more information see Overview: Deploy models, flows, and web apps with Azure AI Studio.
Use the model inference client library to:
- Authenticate against the service
- Get information about the model
- Do chat completions
With some minor adjustments, this client library can also be configured to do inference for Azure OpenAI endpoints. See samples with azure_openai
in their name, in the samples folder.
Product documentation | Samples | API reference documentation | Package (NuGet) | SDK source code
Getting started
Prerequisites
- An Azure subscription.
- An AI Model from the catalog deployed through Azure AI Studio.
- To construct the client library, you will need to pass in the endpoint URL. The endpoint URL has the form
https://your-host-name.your-azure-region.inference.ai.azure.com
, whereyour-host-name
is your unique model deployment host name andyour-azure-region
is the Azure region where the model is deployed (e.g.eastus2
). - Depending on your model deployment and authentication preference, you either need a key to authenticate against the service, or Entra ID credentials. The key is a 32-character string.
Install the package
Install the client library for .NET with NuGet:
dotnet add package Azure.AI.Inference --prerelease
Authenticate the client
The package makes use of common Azure credential providers. To use credential providers provided with the Azure SDK, please install the Azure.Identity package:
dotnet add package Azure.Identity
Key concepts
Create and authenticate a client directly, using key
The package includes ChatCompletionsClient
. It is created by providing your endpoint and credential information to the object:
var endpoint = new Uri(System.Environment.GetEnvironmentVariable("AZURE_AI_CHAT_ENDPOINT"));
var credential = new AzureKeyCredential(System.Environment.GetEnvironmentVariable("AZURE_AI_CHAT_KEY"));
var client = new ChatCompletionsClient(endpoint, credential, new ChatCompletionsClientOptions());
Get AI model information
All clients provide a get_model_info
method to retrive AI model information. This makes a REST call to the /info
route on the provided endpoint, as documented in the REST API reference.
var endpoint = new Uri(System.Environment.GetEnvironmentVariable("AZURE_AI_CHAT_ENDPOINT"));
var credential = new AzureKeyCredential(System.Environment.GetEnvironmentVariable("AZURE_AI_CHAT_KEY"));
var client = new ChatCompletionsClient(endpoint, credential, new ChatCompletionsClientOptions());
Response<ModelInfo> modelInfo = client.GetModelInfo();
Console.WriteLine($"Model name: {modelInfo.Value.ModelName}");
Console.WriteLine($"Model type: {modelInfo.Value.ModelType}");
Console.WriteLine($"Model provider name: {modelInfo.Value.ModelProviderName}");
AI model information is cached in the client, and futher calls to get_model_info
will access the cached value and wil not result in a REST API call.
Chat Completions
The ChatCompletionsClient
has a method named complete
. The method makes a REST API call to the /chat/completions
route on the provided endpoint, as documented in the REST API reference.
See simple chat completion examples below. More can be found in the samples folder.
Sending proprietary model parameters
The REST API defines common model parameters for chat completions. If the model you are targeting has additional parameters you would like to set, the client library allows you easily do so. See Chat completions with additional model-specific parameters.
Inference using Azure OpenAI endpoints
The request and response payloads of the Azure AI Model Inference API is mostly compatible with OpenAI REST APIs for chat completions. Therefore, with some minor adjustments, this client library can be configured to do inference using Azure OpenAI endpoints. See samples with azure_openai
in their name, in the samples folder, and the comments there.
Thread safety
We guarantee that all client instance methods are thread-safe and independent of each other (guideline). This ensures that the recommendation of reusing client instances is always safe, even across threads.
Additional concepts
Client options | Accessing the response | Long-running operations | Handling failures | Diagnostics | Mocking | Client lifetime
Examples
In the following sections you will find simple examples of:
- Chat completions
- Streaming chat completions
- Chat completions with additional model-specific parameters
The examples create a client as mentioned in Create and authenticate a client directly, using key. Only mandatory input settings are shown for simplicity.
See the Samples folder for full working samples for synchronous and asynchronous handling.
Chat completions example
This example demonstrates how to generate a single chat completions, with key authentication.
var endpoint = new Uri(System.Environment.GetEnvironmentVariable("AZURE_AI_CHAT_ENDPOINT"));
var credential = new AzureKeyCredential(System.Environment.GetEnvironmentVariable("AZURE_AI_CHAT_KEY"));
var client = new ChatCompletionsClient(endpoint, credential, new ChatCompletionsClientOptions());
var requestOptions = new ChatCompletionsOptions()
{
Messages =
{
new ChatRequestSystemMessage("You are a helpful assistant."),
new ChatRequestUserMessage("How many feet are in a mile?"),
},
};
Response<ChatCompletions> response = client.Complete(requestOptions);
System.Console.WriteLine(response.Value.Choices[0].Message.Content);
The following types or messages are supported: SystemMessage
,UserMessage
, AssistantMessage
, ToolMessage
. See also samples:
- Sample5_ChatCompletionsWithImageUrl.md for usage of
UserMessage
that includes sending an image URL. - Sample7_ChatCompletionsWithTools.md for usage of
ToolMessage
.
Alternatively, you can read a BinaryData
object based on a JSON string instead of using the strongly typed classes like ChatRequestSystemMessage
and ChatRequestUserMessage
:
var endpoint = new Uri(System.Environment.GetEnvironmentVariable("AZURE_AI_CHAT_ENDPOINT"));
var credential = new AzureKeyCredential(System.Environment.GetEnvironmentVariable("AZURE_AI_CHAT_KEY"));
var client = new ChatCompletionsClient(endpoint, credential, new ChatCompletionsClientOptions());
var requestOptions = new ChatCompletionsOptions()
{
Messages =
{
new ChatRequestSystemMessage("You are a helpful assistant."),
new ChatRequestUserMessage("How many feet are in a mile?"),
},
};
string jsonMessages = "{\"messages\": [{\"role\": \"system\", \"content\": \"You are a helpful assistant.\"}, {\"role\": \"user\", \"content\": \"How many feet are in a mile?\"}]}";
BinaryData messages = BinaryData.FromString(jsonMessages);
requestOptions = ModelReaderWriter.Read<ChatCompletionsOptions>(messages);
Response<ChatCompletions> response = client.Complete(requestOptions);
System.Console.WriteLine(response.Value.Choices[0].Message.Content);
To generate completions for additional messages, simply call client.Complete
multiple times using the same client
.
Streaming chat completions example
This example demonstrates how to generate a single chat completions with streaming response, with key authentication.
var endpoint = new Uri(System.Environment.GetEnvironmentVariable("AZURE_AI_CHAT_ENDPOINT"));
var credential = new AzureKeyCredential(System.Environment.GetEnvironmentVariable("AZURE_AI_CHAT_KEY"));
var client = new ChatCompletionsClient(endpoint, credential, new ChatCompletionsClientOptions());
var requestOptions = new ChatCompletionsOptions()
{
Messages =
{
new ChatRequestSystemMessage("You are a helpful assistant."),
new ChatRequestUserMessage("How many feet are in a mile?"),
},
};
StreamingResponse<StreamingChatCompletionsUpdate> response = await client.CompleteStreamingAsync(requestOptions);
StringBuilder contentBuilder = new();
await foreach (StreamingChatCompletionsUpdate chatUpdate in response)
{
if (!string.IsNullOrEmpty(chatUpdate.ContentUpdate))
{
contentBuilder.Append(chatUpdate.ContentUpdate);
}
}
System.Console.WriteLine(contentBuilder.ToString());
In the above foreach
loop, the updates are progressively added to a string builder as they are streamed in, and then printed out once complete. The updates could be printed as they come in as well.
To generate completions for additional messages, simply call client.complete
multiple times using the same client
.
Chat completions with additional model-specific parameters
In this example, extra JSON elements are inserted at the root of the request body by setting AdditonalProperties
when calling the Complete
method. These are intended for AI models that require extra parameters beyond what is defined in the REST API.
Note that by default, the service will reject any request payload that includes unknown parameters (ones that are not defined in the REST API Request Body table). In order to change the default service behaviour, when the Complete
method includes AdditonalProperties
, the client library will automatically add the HTTP request header "unknown_params": "pass-through"
.
Azure_AI_Inference_ChatCompletionsWithAdditionalPropertiesScenario
var endpoint = new Uri(System.Environment.GetEnvironmentVariable("AZURE_AI_CHAT_ENDPOINT"));
var credential = new AzureKeyCredential(System.Environment.GetEnvironmentVariable("AZURE_AI_CHAT_KEY"));
var client = new ChatCompletionsClient(endpoint, credential, new ChatCompletionsClientOptions());
var requestOptions = new ChatCompletionsOptions()
{
Messages =
{
new ChatRequestSystemMessage("You are a helpful assistant."),
new ChatRequestUserMessage("How many feet are in a mile?"),
},
AdditionalProperties = { { "foo", BinaryData.FromString("\"bar\"") } }, // Optional, add additional properties to the request to pass to the model
};
Response<ChatCompletions> response = client.Complete(requestOptions);
System.Console.WriteLine(response.Value.Choices[0].Message.Content);
Troubleshooting
Exceptions
The complete
, get_model_info
methods raise a RequestFailedException
for a non-success HTTP status code response from the service. The exception's code
will hold the HTTP response status code. The exception's message
contains a detailed message that may be helpful in diagnosing the issue:
try
{
client.Complete(requestOptions);
}
catch (RequestFailedException e)
{
Console.WriteLine($"Exception status code: {e.Status}");
Console.WriteLine($"Exception message: {e.Message}");
Assert.IsTrue(e.Message.Contains("Extra inputs are not permitted"));
}
Reporting issues
To report issues with the client library, or request additional features, please open a GitHub issue here
Next steps
Have a look at the Samples folder, containing fully runnable C# code for doing inference using synchronous and asynchronous methods.
Contributing
This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.microsoft.com.
When you submit a pull request, a CLA-bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., label, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.
This project has adopted the Microsoft Open Source Code of Conduct. For more information, see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.
Azure SDK for .NET