Share via


Azure AI Model Inference (Preview)

Model inference API for models deployed in Azure AI and Azure ML with serverless and self-hosted endpoints.

This connector is available in the following products and regions:

Service Class Regions
Logic Apps Standard All Logic Apps regions except the following:
     -   Azure Government regions
     -   Azure China regions
     -   US Department of Defense (DoD)
Contact
Name Microsoft
URL https://support.microsoft.com
Connector Metadata
Publisher Microsoft Copilot Studio
Privacy policy https://privacy.microsoft.com/privacystatement
Website https://learn.microsoft.com/en-us/azure/ai-studio/reference/reference-model-inference-api
Categories AI

The Azure AI Inference connector let's you connect to your own model from azure ai studio

Prerequisites

  • A model deployed in azure ai studio

Get your credentials

To authenticate your API requests, you will need the endpoint and api key of your model.

Navigate to your resource in azure open ai studio -> deployments. Then under Endpoint the endpoint is the 'Target URI' and the key is under 'Key'.

Supported operations

The Azure AI Inference connector supports the following operations:

  1. GetModelInfo - Returns the information about the model deployed under the endpoint

Required parameters:

* `api-version` - The version of the Inference API
  1. GetChatCompletions - Creates a model response for the given chat conversation

Required parameters:

* `api-version` - The version of the Inference API
* `messages` - The chat conversation to be completed
* `model` - The Deployment name of the model, Required only for openai models

Default values of optional parameters:

* `frequency_penalty` - 0
* `presence_penalty` - 0
* `temperature` - 0.7
* `top_p` - 1

Creating a connection

The connector supports the following authentication types:

Default Parameters for creating connection. All regions Not shareable

Default

Applicable: All regions

Parameters for creating connection.

This is not shareable connection. If the power app is shared with another user, another user will be prompted to create new connection explicitly.

Name Type Description Required
Azure model endpoint URL string Enter the URL of your deployed model endpoint. For example: https://resource.openai.azure.com True
API key securestring Authorization for this API True

Throttling Limits

Name Calls Renewal Period
API calls per connection 100 60 seconds

Actions

Creates a model response for the given chat conversation

Creates a model response for the given chat conversation.

Returns the information about the model deployed under the endpoint

Returns information about the AI model. The method makes a REST API call to the /info route on the given endpoint. This method will only work when using Serverless API or Managed Compute endpoint. It will not work for GitHub Models endpoint or Azure OpenAI endpoint.

Creates a model response for the given chat conversation

Creates a model response for the given chat conversation.

Parameters

Name Key Required Type Description
content
content True string

The contents of the system message.

role
role True string

The role of the messages author, in this case system.

name
name string

An optional name for the participant. Provides the model information to differentiate between participants of the same role.

frequency_penalty
frequency_penalty float

A value that influences the probability of generated tokens appearing based on their cumulative frequency in generated text. Positive values will make tokens less likely to appear as their frequency increases and decrease the likelihood of the model repeating the same statements verbatim. Supported range is [-2, 2].

stream
stream boolean

A value indicating whether chat completions should be streamed for this request.

presence_penalty
presence_penalty float

A value that influences the probability of generated tokens appearing based on their existing presence in generated text. Positive values will make tokens less likely to appear when they already exist and increase the model's likelihood to output new topics. Supported range is [-2, 2].

temperature
temperature float

The sampling temperature to use that controls the apparent creativity of generated completions. Higher values will make output more random while lower values will make results more focused and deterministic. It is not recommended to modify temperature and top_p for the same completions request as the interaction of these two settings is difficult to predict. Supported range is [0, 1].

top_p
top_p float

An alternative to sampling with temperature called nucleus sampling. This value causes the model to consider the results of tokens with the provided probability mass. As an example, a value of 0.15 will cause only the tokens comprising the top 15% of probability mass to be considered. It is not recommended to modify temperature and top_p for the same completions request as the interaction of these two settings is difficult to predict. Supported range is [0, 1].

max_tokens
max_tokens integer

The maximum number of tokens to generate.

type
type string

Must be one of text or json_object.

stop
stop array of string

A collection of textual sequences that will end completions generation.

type
type True string

The type of the tool. Currently, only function is supported.

description
description string

A description of what the function does. The model will use this description when selecting the function and interpreting its parameters.

name
name True string

The name of the function to be called.

parameters
parameters object

The parameters the functions accepts, described as a JSON Schema object.

seed
seed integer

If specified, the system will make a best effort to sample deterministically such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed.

model
model string

ID of the specific AI model to use, if more than one model is available on the endpoint.

The version of the API in the format "YYYY-MM-DD" or "YYYY-MM-DD-preview".
api-version True string

The version of the API in the format "YYYY-MM-DD" or "YYYY-MM-DD-preview".

Controls what happens if an unknown parameter is passed.
extra-parameters string

Controls what happens if extra parameters, undefined by the REST API, are passed in the JSON request payload. This sets the HTTP request header extra-parameters. error - The service will error if it detected extra parameters in the request payload. This is the service default. drop - The service will ignore (drop) extra parameters in the request payload. It will only pass the known parameters to the back-end AI model. pass-through - The service will pass extra parameters to the back-end AI model.

Name of the deployment you want to route the request to.
azureml-model-deployment string

Name of the deployment you want to route the request to. Supported for endpoints that support multiple deployments.

Returns

Represents a chat completion response returned by model, based on the provided input.

Returns the information about the model deployed under the endpoint

Returns information about the AI model. The method makes a REST API call to the /info route on the given endpoint. This method will only work when using Serverless API or Managed Compute endpoint. It will not work for GitHub Models endpoint or Azure OpenAI endpoint.

Parameters

Name Key Required Type Description
The version of the API in the format "YYYY-MM-DD" or "YYYY-MM-DD-preview".
api-version True string

The version of the API in the format "YYYY-MM-DD" or "YYYY-MM-DD-preview".

Name of the deployment you want to route the request to.
azureml-model-deployment string

Name of the deployment you want to route the request to. Supported for endpoints that support multiple deployments.

Returns

Represents some basic information about the AI model.

Body
ModelInfo

Definitions

ModelInfo

Represents some basic information about the AI model.

Name Path Type Description
model_name
model_name string

The name of the AI model.

model_type string

The type of the AI model. A Unique identifier for the profile.

model_provider_name
model_provider_name string

The model provider name.

capabilities
capabilities

ChatCompletionMessageToolCalls

The tool calls generated by the model, such as function calls.

Name Path Type Description
Items
ChatCompletionMessageToolCall

ChatCompletionMessageToolCall

Name Path Type Description
id
id string

The ID of the tool call.

type
type string

The type of the tool. Currently, only function is supported.

name
function.name string

The name of the function to call.

arguments
function.arguments string

The arguments to call the function with, as generated by the model in JSON format. Note that the model does not always generate valid JSON, and may hallucinate parameters not defined by your function schema. Validate the arguments in your code before calling your function.

ChatCompletionResponseMessage

A chat completion message generated by the model.

Name Path Type Description
content
content string

The contents of the message.

tool_calls
tool_calls ChatCompletionMessageToolCalls

The tool calls generated by the model, such as function calls.

role
role string

The role of the author of this message.

CreateChatCompletionResponse

Represents a chat completion response returned by model, based on the provided input.

Name Path Type Description
id
id string

A unique identifier associated with this chat completions response.

choices
choices array of object

A list of chat completion choices. Can be more than one if n is greater than 1.

choices.finish_reason string

The reason the model stopped generating tokens. This will be stop if the model hit a natural stop point or a provided stop sequence,length if the maximum number of tokens specified in the request was reached,content_filter if content was omitted due to a flag from our content filters,tool_calls if the model called a tool.

content_filter_result
choices.content_filter_result
index
choices.index integer

The ordered index associated with this chat completions choice.

message
choices.message ChatCompletionResponseMessage

A chat completion message generated by the model.

created
created integer

The first timestamp associated with generation activity for this completions response, represented as seconds since the beginning of the Unix epoch of 00:00 on 1 Jan 1970.

model
model string

The model used for the chat completion.

object
object string

The object type, which is always chat.completion.

usage
usage CompletionUsage

Representation of the token counts processed for a completions request. Counts consider all tokens across prompts, choices, choice alternates, best_of generations, and other consumers.

CompletionUsage

Representation of the token counts processed for a completions request. Counts consider all tokens across prompts, choices, choice alternates, best_of generations, and other consumers.

Name Path Type Description
completion_tokens
completion_tokens integer

The number of tokens generated across all completions emissions.

prompt_tokens
prompt_tokens integer

The number of tokens in the provided prompts for the completions request.

total_tokens
total_tokens integer

The total number of tokens processed for the completions request and response.