Instant models in Microsoft Foundry (preview)

Instant models let you call any supported model by name — no deployment required. Create a Foundry project, start coding, and use any available model immediately.

Prerequisites

An Azure subscription. Create one for free.
Sign in to Microsoft Foundry. Make sure the New Foundry toggle is on. These steps refer to Foundry (new).
A Foundry project in West US 3 (the only supported region for instant models during preview). If you need to create a project, see Create a project.
The Foundry User role on the project or account.

Important

The Foundry RBAC roles were recently renamed. Foundry User, Foundry Owner, Foundry Account Owner, and Foundry Project Manager were previously named Azure AI User, Azure AI Owner, Azure AI Account Owner, and Azure AI Project Manager. You might still see the previous names in some places while the rename rolls out. The role IDs and core permissions are unchanged by the rename.

Start using models instantly

With instant models, the workflow is simple — use a supported instant model name in your code. No deployment needed. The same API, SDK, and client you already use for deployments works with instant models. No second SDK, no separate client, no configuration changes.

The only change from deployment-based code is the model parameter. In the code below, replace "gpt-5-mini" with the name any instant model.

from azure.identity import DefaultAzureCredential
from azure.ai.projects import AIProjectClient

# Format: "https://resource_name.ai.azure.com/api/projects/project_name"
PROJECT_ENDPOINT = "your_project_endpoint"

# Create project and openai clients to call Foundry API
project = AIProjectClient(
    endpoint=PROJECT_ENDPOINT,
    credential=DefaultAzureCredential(),
)
openai = project.get_openai_client()

# Run a responses API call
response = openai.responses.create(
    model="gpt-5-mini",
    input="What is the size of France in square miles?",
)
print(f"Response output: {response.output_text}")

The only change from deployment-based code is the model parameter. In the code below, replace "gpt-5-mini" with the name any instant model.

using Azure.Identity;
using Azure.AI.Projects;
using Azure.AI.Extensions.OpenAI;
using OpenAI.Responses;

#pragma warning disable OPENAI001

// Format: "https://resource_name.ai.azure.com/api/projects/project_name"
var ProjectEndpoint = "your_project_endpoint";

// Create project client to call Foundry API
AIProjectClient projectClient = new(
    endpoint: new Uri(ProjectEndpoint),
    tokenProvider: new DefaultAzureCredential());

// Run a responses API call
ProjectResponsesClient responseClient = projectClient.ProjectOpenAIClient.GetProjectResponsesClientForModel("gpt-5-mini"); 
ResponseResult response = await responseClient.CreateResponseAsync(
    "What is the size of France in square miles?");
Console.WriteLine(response.GetOutputText());

The only change from deployment-based code is the model parameter. In the code below, replace "gpt-5-mini" with the name any instant model.

import { DefaultAzureCredential } from "@azure/identity";
import { AIProjectClient } from "@azure/ai-projects";

// Format: "https://resource_name.ai.azure.com/api/projects/project_name"
const PROJECT_ENDPOINT = "your_project_endpoint";

async function main(): Promise<void> {
    // Create project and openai clients to call Foundry API
    const project = new AIProjectClient(PROJECT_ENDPOINT, new DefaultAzureCredential());
    const openai = project.getOpenAIClient();

    // Run a responses API call
    const response = await openai.responses.create({
        model: "gpt-5-mini",
        input: "What is the size of France in square miles?",
    });
    console.log(`Response output: ${response.output_text}`);
}

main().catch(console.error);

The only change from deployment-based code is the model parameter. In the code below, replace "gpt-5-mini" with the name any instant model.

package com.azure.ai.agents;

import com.azure.identity.DefaultAzureCredentialBuilder;
import com.openai.models.responses.Response;
import com.openai.models.responses.ResponseCreateParams;

public class CreateResponse {
    public static void main(String[] args) {
        // Format: "https://resource_name.ai.azure.com/api/projects/project_name"
        String ProjectEndpoint = "your_project_endpoint";

        // Create responses client to call Foundry API
        ResponsesClient responsesClient = new AgentsClientBuilder()
                .credential(new DefaultAzureCredentialBuilder().build())
                .endpoint(ProjectEndpoint)
                .buildResponsesClient();

        // Run a responses API call
        ResponseCreateParams responseRequest = new ResponseCreateParams.Builder()
                .input("What is the size of France in square miles?")
                .model("gpt-5-mini")
                .build();
        Response response = responsesClient.getResponseService().create(responseRequest);
        System.out.println(response.output());
    }
}

The only change from deployment-based code is the model parameter. In the code below, replace "gpt-5-mini" with the name any instant model. Also replace YOUR-FOUNDRY-RESOURCE-NAME with your values:

curl -X POST https://YOUR-FOUNDRY-RESOURCE-NAME.services.ai.azure.com/api/projects/YOUR-PROJECT-NAME/openai/v1/responses \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $AZURE_AI_AUTH_TOKEN" \
-d '{
        "model": "gpt-5.1-mini",
        "input": "What is the size of France in square miles?"
}'

Why instant models matter

Switch models by changing one string — use any instant model name in the model= line, without creating or deleting deployments.
Same API and SDK — the same calls work for both instant models and deployments.
Works with your dev tools — instant models integrate with Foundry CLI, VS Code, and CI/CD pipelines the same way deployments do.

Deployments aren't going away. They remain the right choice when you need reserved throughput, custom content filters, data residency, or advanced enterprise configurations. Instant models simplify the getting-started experience so that deployments become something you level up to, not a gate you must pass before you can use a model.

Supported models

New models support instant access by default when they're released. Support for additional models is considered based on customer demand.

To see all models that support instant access:

Open a project in West US 3 in the new Foundry experience,
Select Discover in the upper-right navigation, then Models in the left pane.
In the model catalog, select Instant under Development options to view the available instant models.

You can also list instant models programmatically:

SUBSCRIPTION_ID="<your-subscription-id>"
LOCATION="westus3"

az rest --method get \
  --url "https://management.azure.com/subscriptions/$SUBSCRIPTION_ID/providers/Microsoft.CognitiveServices/locations/$LOCATION/models?api-version=2025-06-01" \
  --output json \
| jq -r '(.value // .models // .)[]
  | select((.model.capabilities.instant // "false" | tostring | ascii_downcase) == "true")
  | .model.name' \
| sort -u

Note

During the preview, instant models are available in projects in West US 3 only.

Some instant models might appear in the list even if your subscription has no quota for them. For more information, see Quotas and limits for Foundry Models.

When to use instant models vs. deployments

Scenario	Recommended approach
Getting started, prototyping, or experimentation	Instant models
Using the latest model immediately after release	Instant models
Need reserved capacity or predictable throughput	Deployment
Require provisioned throughput (PTU)	Deployment
Need data residency in a specific region	Deployment
Custom content filtering policies per model	Deployment
Custom guardrails per model	Deployment
Endpoint-specific configuration (for example, version locks per endpoint)	Deployment
Fine-grained quota partitioning across teams	Deployment
Fine-tuned models	Deployment

Instant models and deployments can coexist in the same project. You can start with instant models and create deployments later as your requirements evolve.

Model versions

By default, instant models route to the latest evergreen version of a model. To pin to a specific version, append the version date to the model name as a hyphenated suffix:

What you pass as `model`	Behavior
`model-name`	Routes to the latest version
`model-name-2025-04-01`	Routes to that specific version

Version pinning is opt-in. If your application requires stability, include the version suffix. Otherwise, you always get the latest version automatically.

How quota is consumed

Instant models draw from a per-model global quota pool assigned to your subscription. This quota is separate from the regional quota used by standard deployments.

You don't allocate or partition global quota — it's shared automatically across all instant model usage in your subscription.
Global Standard deployments reserve a portion of your global quota. Instant models use whatever capacity remains.
Other deployment types (Regional Standard, Provisioned) use separate regional quota and don't affect your instant model capacity.
If instant model requests are throttled, you can request a quota increase or create a deployment with reserved capacity.

For more details on how global and regional quotas interact, see Manage and increase quotas.

Enterprise controls

Capability	How it works
Block specific models or providers	Azure Policy definitions apply to instant models the same way they apply to deployments
Pin to a model version	Append the version suffix to the model name (see Model versions)
Disable instant models entirely	Administrators can turn off instant models at the subscription level through Azure Policy

To remove instant models from an account, configure the settings through Bicep or ARM REST.

REST API
Bicep

Update your account with:

PATCH https://management.azure.com/subscriptions/{sub}/resourceGroups/{rg}/providers/Microsoft.CognitiveServices/accounts/{account}?api-version=2026-01-15-preview
Authorization: Bearer {arm_token}
Content-Type: application/json

Use this request body to effectively shut off instant model access:

{
  "properties": {
    "instant": {
      "raiPolicyName": "Microsoft.DefaultV2",
      "modelAllowList": []
    }
  }
}

Update your existing account resource with an instant block:

resource account 'Microsoft.CognitiveServices/accounts@2026-01-15-preview' = {
  name: accountName
  location: location
  kind: 'AIServices'
  sku: {
    name: 'S0'
  }
  // Keep your existing account properties and add instant settings.
  properties: {
    instant: {
      raiPolicyName: 'Microsoft.DefaultV2'
      modelAllowList: []
    }
  }
}

Important

All instant models use default guardrails and content filters. However, you can't configure custom guardrails or Responsible AI (RAI) policies on a per-model basis for instant models. You can set a default RAI policy at the account level through the API, but that policy applies uniformly to all instant models. If you need different content filtering policies for individual models, use a deployment.

Deployment name collisions

New deployments can't use a name that matches an existing model name. If you have an existing deployment whose name collides with a model name, the deployment takes precedence and instant model access for that model name is unavailable in that project.

Limitations during preview

Available in West US 3 only.
Fine-tuned models aren't supported. To use a fine-tuned model, create a deployment.
Guardrails, custom RAI policies, and content filters aren't configurable for instant models.
Only the models listed in Supported models are eligible.

Feedback

Was this page helpful?

Last updated on 2026-06-02

Instant models in Microsoft Foundry (preview)

Prerequisites

Start using models instantly

Why instant models matter

Supported models

When to use instant models vs. deployments

Model versions

How quota is consumed

Enterprise controls

Deployment name collisions

Limitations during preview

Related content

Feedback

Additional resources