Azure AI Model Inference (Preview)
Model inference API for models deployed in Azure AI and Azure ML with serverless and self-hosted endpoints.
This connector is available in the following products and regions:
| Service | Class | Regions |
|---|---|---|
| Logic Apps | Standard | All Logic Apps regions except the following: - Azure Government regions - Azure China regions - US Department of Defense (DoD) |
| Contact | |
|---|---|
| Name | Microsoft |
| URL | https://support.microsoft.com |
| Connector Metadata | |
|---|---|
| Publisher | Microsoft Copilot Studio |
| Privacy policy | https://privacy.microsoft.com/privacystatement |
| Website | https://learn.microsoft.com/en-us/azure/ai-studio/reference/reference-model-inference-api |
| Categories | AI |
The Azure AI Inference connector let's you connect to your own model from azure ai studio
Prerequisites
- A model deployed in azure ai studio
Get your credentials
To authenticate your API requests, you will need the endpoint and api key of your model.
Navigate to your resource in azure open ai studio -> deployments. Then under Endpoint the endpoint is the 'Target URI' and the key is under 'Key'.
Supported operations
The Azure AI Inference connector supports the following operations:
- GetModelInfo - Returns the information about the model deployed under the endpoint
Required parameters:
* `api-version` - The version of the Inference API
- GetChatCompletions - Creates a model response for the given chat conversation
Required parameters:
* `api-version` - The version of the Inference API
* `messages` - The chat conversation to be completed
* `model` - The Deployment name of the model, Required only for openai models
Default values of optional parameters:
* `frequency_penalty` - 0
* `presence_penalty` - 0
* `temperature` - 0.7
* `top_p` - 1
Creating a connection
The connector supports the following authentication types:
| Default | Parameters for creating connection. | All regions | Not shareable |
Default
Applicable: All regions
Parameters for creating connection.
This is not shareable connection. If the power app is shared with another user, another user will be prompted to create new connection explicitly.
| Name | Type | Description | Required |
|---|---|---|---|
| Azure model endpoint URL | string | Enter the URL of your deployed model endpoint. For example: https://resource.openai.azure.com | True |
| API key | securestring | Authorization for this API | True |
Throttling Limits
| Name | Calls | Renewal Period |
|---|---|---|
| API calls per connection | 100 | 60 seconds |
Actions
| Creates a model response for the given chat conversation |
Creates a model response for the given chat conversation. |
| Returns the information about the model deployed under the endpoint |
Returns information about the AI model. The method makes a REST API call to the |
Creates a model response for the given chat conversation
Creates a model response for the given chat conversation.
Parameters
| Name | Key | Required | Type | Description |
|---|---|---|---|---|
|
content
|
content | True | string |
The contents of the system message. |
|
role
|
role | True | string |
The role of the messages author, in this case |
|
name
|
name | string |
An optional name for the participant. Provides the model information to differentiate between participants of the same role. |
|
|
frequency_penalty
|
frequency_penalty | float |
A value that influences the probability of generated tokens appearing based on their cumulative frequency in generated text. Positive values will make tokens less likely to appear as their frequency increases and decrease the likelihood of the model repeating the same statements verbatim. Supported range is [-2, 2]. |
|
|
stream
|
stream | boolean |
A value indicating whether chat completions should be streamed for this request. |
|
|
presence_penalty
|
presence_penalty | float |
A value that influences the probability of generated tokens appearing based on their existing presence in generated text. Positive values will make tokens less likely to appear when they already exist and increase the model's likelihood to output new topics. Supported range is [-2, 2]. |
|
|
temperature
|
temperature | float |
The sampling temperature to use that controls the apparent creativity of generated completions. Higher values will make output more random while lower values will make results more focused and deterministic. It is not recommended to modify temperature and top_p for the same completions request as the interaction of these two settings is difficult to predict. Supported range is [0, 1]. |
|
|
top_p
|
top_p | float |
An alternative to sampling with temperature called nucleus sampling. This value causes the model to consider the results of tokens with the provided probability mass. As an example, a value of 0.15 will cause only the tokens comprising the top 15% of probability mass to be considered. It is not recommended to modify temperature and top_p for the same completions request as the interaction of these two settings is difficult to predict. Supported range is [0, 1]. |
|
|
max_tokens
|
max_tokens | integer |
The maximum number of tokens to generate. |
|
|
type
|
type | string |
Must be one of |
|
|
stop
|
stop | array of string |
A collection of textual sequences that will end completions generation. |
|
|
type
|
type | True | string |
The type of the tool. Currently, only |
|
description
|
description | string |
A description of what the function does. The model will use this description when selecting the function and interpreting its parameters. |
|
|
name
|
name | True | string |
The name of the function to be called. |
|
parameters
|
parameters | object |
The parameters the functions accepts, described as a JSON Schema object. |
|
|
seed
|
seed | integer |
If specified, the system will make a best effort to sample deterministically such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed. |
|
|
model
|
model | string |
ID of the specific AI model to use, if more than one model is available on the endpoint. |
|
|
The version of the API in the format "YYYY-MM-DD" or "YYYY-MM-DD-preview".
|
api-version | True | string |
The version of the API in the format "YYYY-MM-DD" or "YYYY-MM-DD-preview". |
|
Controls what happens if an unknown parameter is passed.
|
extra-parameters | string |
Controls what happens if extra parameters, undefined by the REST API, are passed in the JSON request payload. This sets the HTTP request header |
|
|
Name of the deployment you want to route the request to.
|
azureml-model-deployment | string |
Name of the deployment you want to route the request to. Supported for endpoints that support multiple deployments. |
Returns
Represents a chat completion response returned by model, based on the provided input.
Returns the information about the model deployed under the endpoint
Returns information about the AI model. The method makes a REST API call to the /info route on the given endpoint. This method will only work when using Serverless API or Managed Compute endpoint. It will not work for GitHub Models endpoint or Azure OpenAI endpoint.
Parameters
| Name | Key | Required | Type | Description |
|---|---|---|---|---|
|
The version of the API in the format "YYYY-MM-DD" or "YYYY-MM-DD-preview".
|
api-version | True | string |
The version of the API in the format "YYYY-MM-DD" or "YYYY-MM-DD-preview". |
|
Name of the deployment you want to route the request to.
|
azureml-model-deployment | string |
Name of the deployment you want to route the request to. Supported for endpoints that support multiple deployments. |
Returns
Represents some basic information about the AI model.
- Body
- ModelInfo
Definitions
ModelInfo
Represents some basic information about the AI model.
| Name | Path | Type | Description |
|---|---|---|---|
|
model_name
|
model_name | string |
The name of the AI model. |
|
|
model_type | string |
The type of the AI model. A Unique identifier for the profile. |
|
model_provider_name
|
model_provider_name | string |
The model provider name. |
|
capabilities
|
capabilities |
ChatCompletionMessageToolCalls
The tool calls generated by the model, such as function calls.
| Name | Path | Type | Description |
|---|---|---|---|
|
Items
|
ChatCompletionMessageToolCall |
ChatCompletionMessageToolCall
| Name | Path | Type | Description |
|---|---|---|---|
|
id
|
id | string |
The ID of the tool call. |
|
type
|
type | string |
The type of the tool. Currently, only |
|
name
|
function.name | string |
The name of the function to call. |
|
arguments
|
function.arguments | string |
The arguments to call the function with, as generated by the model in JSON format. Note that the model does not always generate valid JSON, and may hallucinate parameters not defined by your function schema. Validate the arguments in your code before calling your function. |
ChatCompletionResponseMessage
A chat completion message generated by the model.
| Name | Path | Type | Description |
|---|---|---|---|
|
content
|
content | string |
The contents of the message. |
|
tool_calls
|
tool_calls | ChatCompletionMessageToolCalls |
The tool calls generated by the model, such as function calls. |
|
role
|
role | string |
The role of the author of this message. |
CreateChatCompletionResponse
Represents a chat completion response returned by model, based on the provided input.
| Name | Path | Type | Description |
|---|---|---|---|
|
id
|
id | string |
A unique identifier associated with this chat completions response. |
|
choices
|
choices | array of object |
A list of chat completion choices. Can be more than one if |
|
|
choices.finish_reason | string |
The reason the model stopped generating tokens. This will be |
|
content_filter_result
|
choices.content_filter_result | ||
|
index
|
choices.index | integer |
The ordered index associated with this chat completions choice. |
|
message
|
choices.message | ChatCompletionResponseMessage |
A chat completion message generated by the model. |
|
created
|
created | integer |
The first timestamp associated with generation activity for this completions response, represented as seconds since the beginning of the Unix epoch of 00:00 on 1 Jan 1970. |
|
model
|
model | string |
The model used for the chat completion. |
|
object
|
object | string |
The object type, which is always |
|
usage
|
usage | CompletionUsage |
Representation of the token counts processed for a completions request. Counts consider all tokens across prompts, choices, choice alternates, best_of generations, and other consumers. |
CompletionUsage
Representation of the token counts processed for a completions request. Counts consider all tokens across prompts, choices, choice alternates, best_of generations, and other consumers.
| Name | Path | Type | Description |
|---|---|---|---|
|
completion_tokens
|
completion_tokens | integer |
The number of tokens generated across all completions emissions. |
|
prompt_tokens
|
prompt_tokens | integer |
The number of tokens in the provided prompts for the completions request. |
|
total_tokens
|
total_tokens | integer |
The total number of tokens processed for the completions request and response. |