Azure AI Model Inference (Preview)

Model inference API for models deployed in Azure AI and Azure ML with serverless and self-hosted endpoints.

This connector is available in the following products and regions:

Service	Class	Regions
Logic Apps	Standard	All Logic Apps regions except the following: - Azure Government regions - Azure China regions - US Department of Defense (DoD)

Contact
Name	Microsoft
URL	https://support.microsoft.com

Connector Metadata
Publisher	Microsoft Copilot Studio
Privacy policy	https://privacy.microsoft.com/privacystatement
Website	https://learn.microsoft.com/en-us/azure/ai-studio/reference/reference-model-inference-api
Categories	AI

The Azure AI Inference connector let's you connect to your own model from azure ai studio

Prerequisites

A model deployed in azure ai studio

Get your credentials

To authenticate your API requests, you will need the endpoint and api key of your model.

Navigate to your resource in azure open ai studio -> deployments. Then under Endpoint the endpoint is the 'Target URI' and the key is under 'Key'.

Supported operations

The Azure AI Inference connector supports the following operations:

GetModelInfo - Returns the information about the model deployed under the endpoint

Required parameters:

* `api-version` - The version of the Inference API

GetChatCompletions - Creates a model response for the given chat conversation

Required parameters:

* `api-version` - The version of the Inference API
* `messages` - The chat conversation to be completed
* `model` - The Deployment name of the model, Required only for openai models

Default values of optional parameters:

* `frequency_penalty` - 0
* `presence_penalty` - 0
* `temperature` - 0.7
* `top_p` - 1

Creating a connection

The connector supports the following authentication types:


Default	Parameters for creating connection.	All regions	Not shareable

Default

Applicable: All regions

Parameters for creating connection.

This is not shareable connection. If the power app is shared with another user, another user will be prompted to create new connection explicitly.

Name	Type	Description	Required
Azure model endpoint URL	string	Enter the URL of your deployed model endpoint. For example: https://resource.openai.azure.com	True
API key	securestring	Authorization for this API	True

Throttling Limits

Name	Calls	Renewal Period
API calls per connection	100	60 seconds

Actions

Creates a model response for the given chat conversation	Creates a model response for the given chat conversation.
Returns the information about the model deployed under the endpoint	Returns information about the AI model. The method makes a REST API call to the `/info` route on the given endpoint. This method will only work when using Serverless API or Managed Compute endpoint. It will not work for GitHub Models endpoint or Azure OpenAI endpoint.

Creates a model response for the given chat conversation

Operation ID:: GetChatCompletions

Creates a model response for the given chat conversation.

Parameters

Name	Key	Required	Type	Description
content	content	True	string	The contents of the system message.
role	role	True	string	The role of the messages author, in this case `system`.
name	name		string	An optional name for the participant. Provides the model information to differentiate between participants of the same role.
frequency_penalty	frequency_penalty		float	A value that influences the probability of generated tokens appearing based on their cumulative frequency in generated text. Positive values will make tokens less likely to appear as their frequency increases and decrease the likelihood of the model repeating the same statements verbatim. Supported range is [-2, 2].
stream	stream		boolean	A value indicating whether chat completions should be streamed for this request.
presence_penalty	presence_penalty		float	A value that influences the probability of generated tokens appearing based on their existing presence in generated text. Positive values will make tokens less likely to appear when they already exist and increase the model's likelihood to output new topics. Supported range is [-2, 2].
temperature	temperature		float	The sampling temperature to use that controls the apparent creativity of generated completions. Higher values will make output more random while lower values will make results more focused and deterministic. It is not recommended to modify temperature and top_p for the same completions request as the interaction of these two settings is difficult to predict. Supported range is [0, 1].
top_p	top_p		float	An alternative to sampling with temperature called nucleus sampling. This value causes the model to consider the results of tokens with the provided probability mass. As an example, a value of 0.15 will cause only the tokens comprising the top 15% of probability mass to be considered. It is not recommended to modify temperature and top_p for the same completions request as the interaction of these two settings is difficult to predict. Supported range is [0, 1].
max_tokens	max_tokens		integer	The maximum number of tokens to generate.
type	type		string	Must be one of `text` or `json_object`.
stop	stop		array of string	A collection of textual sequences that will end completions generation.
type	type	True	string	The type of the tool. Currently, only `function` is supported.
description	description		string	A description of what the function does. The model will use this description when selecting the function and interpreting its parameters.
name	name	True	string	The name of the function to be called.
parameters	parameters		object	The parameters the functions accepts, described as a JSON Schema object.
seed	seed		integer	If specified, the system will make a best effort to sample deterministically such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed.
model	model		string	ID of the specific AI model to use, if more than one model is available on the endpoint.
The version of the API in the format "YYYY-MM-DD" or "YYYY-MM-DD-preview".	api-version	True	string	The version of the API in the format "YYYY-MM-DD" or "YYYY-MM-DD-preview".
Controls what happens if an unknown parameter is passed.	extra-parameters		string	Controls what happens if extra parameters, undefined by the REST API, are passed in the JSON request payload. This sets the HTTP request header `extra-parameters`. error - The service will error if it detected extra parameters in the request payload. This is the service default. drop - The service will ignore (drop) extra parameters in the request payload. It will only pass the known parameters to the back-end AI model. pass-through - The service will pass extra parameters to the back-end AI model.
Name of the deployment you want to route the request to.	azureml-model-deployment		string	Name of the deployment you want to route the request to. Supported for endpoints that support multiple deployments.

Returns

Represents a chat completion response returned by model, based on the provided input.

Body: CreateChatCompletionResponse

Returns the information about the model deployed under the endpoint

Operation ID:: GetModelInfo

Returns information about the AI model. The method makes a REST API call to the /info route on the given endpoint. This method will only work when using Serverless API or Managed Compute endpoint. It will not work for GitHub Models endpoint or Azure OpenAI endpoint.

Parameters

Name	Key	Required	Type	Description
The version of the API in the format "YYYY-MM-DD" or "YYYY-MM-DD-preview".	api-version	True	string	The version of the API in the format "YYYY-MM-DD" or "YYYY-MM-DD-preview".
Name of the deployment you want to route the request to.	azureml-model-deployment		string	Name of the deployment you want to route the request to. Supported for endpoints that support multiple deployments.

Returns

Represents some basic information about the AI model.

Body: ModelInfo

Definitions

ModelInfo

Represents some basic information about the AI model.

Name	Path	Type	Description
model_name	model_name	string	The name of the AI model.
	model_type	string	The type of the AI model. A Unique identifier for the profile.
model_provider_name	model_provider_name	string	The model provider name.
capabilities	capabilities

ChatCompletionMessageToolCalls

The tool calls generated by the model, such as function calls.

Name	Path	Type	Description
Items		ChatCompletionMessageToolCall

ChatCompletionMessageToolCall

Name	Path	Type	Description
id	id	string	The ID of the tool call.
type	type	string	The type of the tool. Currently, only `function` is supported.
name	function.name	string	The name of the function to call.
arguments	function.arguments	string	The arguments to call the function with, as generated by the model in JSON format. Note that the model does not always generate valid JSON, and may hallucinate parameters not defined by your function schema. Validate the arguments in your code before calling your function.

ChatCompletionResponseMessage

A chat completion message generated by the model.

Name	Path	Type	Description
content	content	string	The contents of the message.
tool_calls	tool_calls	ChatCompletionMessageToolCalls	The tool calls generated by the model, such as function calls.
role	role	string	The role of the author of this message.

CreateChatCompletionResponse

Represents a chat completion response returned by model, based on the provided input.

Name	Path	Type	Description
id	id	string	A unique identifier associated with this chat completions response.
choices	choices	array of object	A list of chat completion choices. Can be more than one if `n` is greater than 1.
	choices.finish_reason	string	The reason the model stopped generating tokens. This will be `stop` if the model hit a natural stop point or a provided stop sequence,`length` if the maximum number of tokens specified in the request was reached,`content_filter` if content was omitted due to a flag from our content filters,`tool_calls` if the model called a tool.
content_filter_result	choices.content_filter_result
index	choices.index	integer	The ordered index associated with this chat completions choice.
message	choices.message	ChatCompletionResponseMessage	A chat completion message generated by the model.
created	created	integer	The first timestamp associated with generation activity for this completions response, represented as seconds since the beginning of the Unix epoch of 00:00 on 1 Jan 1970.
model	model	string	The model used for the chat completion.
object	object	string	The object type, which is always `chat.completion`.
usage	usage	CompletionUsage	Representation of the token counts processed for a completions request. Counts consider all tokens across prompts, choices, choice alternates, best_of generations, and other consumers.

CompletionUsage

Representation of the token counts processed for a completions request. Counts consider all tokens across prompts, choices, choice alternates, best_of generations, and other consumers.

Name	Path	Type	Description
completion_tokens	completion_tokens	integer	The number of tokens generated across all completions emissions.
prompt_tokens	prompt_tokens	integer	The number of tokens in the provided prompts for the completions request.
total_tokens	total_tokens	integer	The total number of tokens processed for the completions request and response.

Share via

Azure AI Model Inference (Preview)

Prerequisites

Get your credentials

Supported operations

Creating a connection

Default

Throttling Limits

Actions

Creates a model response for the given chat conversation

Parameters

Returns

Returns the information about the model deployed under the endpoint

Parameters

Returns

Definitions

ModelInfo

ChatCompletionMessageToolCalls

ChatCompletionMessageToolCall

ChatCompletionResponseMessage

CreateChatCompletionResponse

CompletionUsage