Edit

Share via


Azure OpenAI reasoning models

Azure OpenAI reasoning models are designed to tackle reasoning and problem-solving tasks with increased focus and capability. These models spend more time processing and understanding the user's request, making them exceptionally strong in areas like science, coding, and math compared to previous iterations.

Key capabilities of reasoning models:

  • Complex Code Generation: Capable of generating algorithms and handling advanced coding tasks to support developers.
  • Advanced Problem Solving: Ideal for comprehensive brainstorming sessions and addressing multifaceted challenges.
  • Complex Document Comparison: Perfect for analyzing contracts, case files, or legal documents to identify subtle differences.
  • Instruction Following and Workflow Management: Particularly effective for managing workflows requiring shorter contexts.

Usage

These models don't currently support the same set of parameters as other models that use the chat completions API.

Chat completions API

using Azure.Identity;
using OpenAI;
using OpenAI.Chat;
using System.ClientModel.Primitives;

#pragma warning disable OPENAI001 //currently required for token based authentication

BearerTokenPolicy tokenPolicy = new(
    new DefaultAzureCredential(),
    "https://cognitiveservices.azure.com/.default");

ChatClient client = new(
    model: "o4-mini",
    authenticationPolicy: tokenPolicy,
    options: new OpenAIClientOptions()
    {

        Endpoint = new Uri("https://YOUR-RESOURCE-NAME.openai.azure.com/openai/v1")
    }
);

ChatCompletionOptions options = new ChatCompletionOptions
{
    MaxOutputTokenCount = 100000
};

ChatCompletion completion = client.CompleteChat(
         new DeveloperChatMessage("You are a helpful assistant"),
         new UserChatMessage("Tell me about the bitter lesson")
    );

Console.WriteLine($"[ASSISTANT]: {completion.Content[0].Text}");

Reasoning effort

Note

Reasoning models have reasoning_tokens as part of completion_tokens_details in the model response. These are hidden tokens that aren't returned as part of the message response content but are used by the model to help generate a final answer to your request. reasoning_effort can be set to low, medium, or high for all reasoning models except o1-mini. GPT-5 reasoning models support a new reasoning_effort setting of minimal. The higher the effort setting, the longer the model will spend processing the request, which will generally result in a larger number of reasoning_tokens.

Developer messages

Functionally developer messages "role": "developer" are the same as system messages.

Adding a developer message to the previous code example would look as follows:


using Azure.Identity;
using OpenAI;
using OpenAI.Chat;
using System.ClientModel.Primitives;

#pragma warning disable OPENAI001 //currently required for token based authentication

BearerTokenPolicy tokenPolicy = new(
    new DefaultAzureCredential(),
    "https://cognitiveservices.azure.com/.default");

ChatClient client = new(
    model: "o4-mini",
    authenticationPolicy: tokenPolicy,
    options: new OpenAIClientOptions()
    {

        Endpoint = new Uri("https://YOUR-RESOURCE-NAME.openai.azure.com/openai/v1")
    }
);

ChatCompletionOptions options = new ChatCompletionOptions
{
    ReasoningEffortLevel = ChatReasoningEffortLevel.Low,
    MaxOutputTokenCount = 100000
};

ChatCompletion completion = client.CompleteChat(
         new DeveloperChatMessage("You are a helpful assistant"),
         new UserChatMessage("Tell me about the bitter lesson")
    );

Console.WriteLine($"[ASSISTANT]: {completion.Content[0].Text}");

Reasoning summary

When using the latest reasoning models with the Responses API you can use the reasoning summary parameter to receive summaries of the model's chain of thought reasoning.

Important

Attempting to extract raw reasoning through methods other than the reasoning summary parameter are not supported, may violate the Acceptable Use Policy, and may result in throttling or suspension when detected.

using OpenAI;
using OpenAI.Responses;
using System.ClientModel.Primitives;
using Azure.Identity;

#pragma warning disable OPENAI001 //currently required for token based authentication

BearerTokenPolicy tokenPolicy = new(
    new DefaultAzureCredential(),
    "https://cognitiveservices.azure.com/.default");

OpenAIResponseClient client = new(
    model: "o4-mini",
    authenticationPolicy: tokenPolicy,
    options: new OpenAIClientOptions()
    {
        Endpoint = new Uri("https://YOUR-RESOURCE-NAME.openai.azure.com/openai/v1")
    }
);

OpenAIResponse response = await client.CreateResponseAsync(
    userInputText: "What's the optimal strategy to win at poker?",
    new ResponseCreationOptions()
    {
        ReasoningOptions = new ResponseReasoningOptions()
        {
            ReasoningEffortLevel = ResponseReasoningEffortLevel.High,
            ReasoningSummaryVerbosity = ResponseReasoningSummaryVerbosity.Auto,
        },
    });

// Get the reasoning summary from the first OutputItem (ReasoningResponseItem)
Console.WriteLine("=== Reasoning Summary ===");
foreach (var item in response.OutputItems)
{
    if (item is ReasoningResponseItem reasoningItem)
    {
        foreach (var summaryPart in reasoningItem.SummaryParts)
        {
            if (summaryPart is ReasoningSummaryTextPart textPart)
            {
                Console.WriteLine(textPart.Text);
            }
        }
    }
}

Console.WriteLine("\n=== Assistant Response ===");
// Get the assistant's output
Console.WriteLine(response.GetOutputText());

Note

Even when enabled, reasoning summaries are not guaranteed to be generated for every step/request. This is expected behavior.

Python lark

GPT-5 series reasoning models have the ability to call a new custom_tool called lark_tool. This tool is based on Python lark and can be used for more flexible constraining of model output.

Responses API

{
  "model": "gpt-5-2025-08-07",
  "input": "please calculate the area of a circle with radius equal to the number of 'r's in strawberry",
  "tools": [
    {
      "type": "custom",
      "name": "lark_tool",
      "format": {
        "type": "grammar",
        "syntax": "lark",
        "definition": "start: QUESTION NEWLINE ANSWER\nQUESTION: /[^\\n?]{1,200}\\?/\nNEWLINE: /\\n/\nANSWER: /[^\\n!]{1,200}!/"
      }
    }
  ],
  "tool_choice": "required"
}

Microsoft Entra ID:

from openai import OpenAI
from azure.identity import DefaultAzureCredential, get_bearer_token_provider

token_provider = get_bearer_token_provider(
    DefaultAzureCredential(), "https://cognitiveservices.azure.com/.default"
)

client = OpenAI(  
  base_url = "https://YOUR-RESOURCE-NAME.openai.azure.com/openai/v1/",  
  api_key=token_provider,
)

response = client.responses.create(  
    model="gpt-5",  # replace with your model deployment name  
    tools=[  
        {  
            "type": "custom",
            "name": "lark_tool",
            "format": {
                "type": "grammar",
                "syntax": "lark",
                "definition": "start: QUESTION NEWLINE ANSWER\nQUESTION: /[^\\n?]{1,200}\\?/\nNEWLINE: /\\n/\nANSWER: /[^\\n!]{1,200}!/"
            }
        }  
    ],  
    input=[{"role": "user", "content": "Please calculate the area of a circle with radius equal to the number of 'r's in strawberry"}],  
)  

print(response.model_dump_json(indent=2))  

API Key:

import os
from openai import OpenAI

client = OpenAI(  
  base_url = "https://YOUR-RESOURCE-NAME.openai.azure.com/openai/v1/",
  api_key=os.getenv("AZURE_OPENAI_API_KEY")  
)

response = client.responses.create(  
    model="gpt-5",  # replace with your model deployment name  
    tools=[  
        {  
            "type": "custom",
            "name": "lark_tool",
            "format": {
                "type": "grammar",
                "syntax": "lark",
                "definition": "start: QUESTION NEWLINE ANSWER\nQUESTION: /[^\\n?]{1,200}\\?/\nNEWLINE: /\\n/\nANSWER: /[^\\n!]{1,200}!/"
            }
        }  
    ],  
    input=[{"role": "user", "content": "Please calculate the area of a circle with radius equal to the number of 'r's in strawberry"}],  
)  

print(response.model_dump_json(indent=2))  
  

Output:

{
  "id": "resp_689a0cf927408190b8875915747667ad01c936c6ffb9d0d3",
  "created_at": 1754926332.0,
  "error": null,
  "incomplete_details": null,
  "instructions": null,
  "metadata": {},
  "model": "gpt-5",
  "object": "response",
  "output": [
    {
      "id": "rs_689a0cfd1c888190a2a67057f471b5cc01c936c6ffb9d0d3",
      "summary": [],
      "type": "reasoning",
      "encrypted_content": null,
      "status": null
    },
    {
      "id": "msg_689a0d00e60c81908964e5e9b2d6eeb501c936c6ffb9d0d3",
      "content": [
        {
          "annotations": [],
          "text": "“strawberry” has 3 r’s, so the radius is 3.\nArea = πr² = π × 3² = 9π ≈ 28.27 square units.",
          "type": "output_text",
          "logprobs": null
        }
      ],
      "role": "assistant",
      "status": "completed",
      "type": "message"
    }
  ],
  "parallel_tool_calls": true,
  "temperature": 1.0,
  "tool_choice": "auto",
  "tools": [
    {
      "name": "lark_tool",
      "parameters": null,
      "strict": null,
      "type": "custom",
      "description": null,
      "format": {
        "type": "grammar",
        "definition": "start: QUESTION NEWLINE ANSWER\nQUESTION: /[^\\n?]{1,200}\\?/\nNEWLINE: /\\n/\nANSWER: /[^\\n!]{1,200}!/",
        "syntax": "lark"
      }
    }
  ],
  "top_p": 1.0,
  "background": false,
  "max_output_tokens": null,
  "max_tool_calls": null,
  "previous_response_id": null,
  "prompt": null,
  "prompt_cache_key": null,
  "reasoning": {
    "effort": "medium",
    "generate_summary": null,
    "summary": null
  },
  "safety_identifier": null,
  "service_tier": "default",
  "status": "completed",
  "text": {
    "format": {
      "type": "text"
    }
  },
  "top_logprobs": null,
  "truncation": "disabled",
  "usage": {
    "input_tokens": 139,
    "input_tokens_details": {
      "cached_tokens": 0
    },
    "output_tokens": 240,
    "output_tokens_details": {
      "reasoning_tokens": 192
    },
    "total_tokens": 379
  },
  "user": null,
  "content_filters": null,
  "store": true
}

Chat Completions

{
  "messages": [
    {
      "role": "user",
      "content": "Which one is larger, 42 or 0?"
    }
  ],
  "tools": [
    {
      "type": "custom",
      "name": "custom_tool",
      "custom": {
        "name": "lark_tool",
        "format": {
          "type": "grammar",
          "grammar": {
            "syntax": "lark",
            "definition": "start: QUESTION NEWLINE ANSWER\nQUESTION: /[^\\n?]{1,200}\\?/\nNEWLINE: /\\n/\nANSWER: /[^\\n!]{1,200}!/"
          }
        }
      }
    }
  ],
  "tool_choice": "required",
  "model": "gpt-5-2025-08-07"
}

Availability

Region availability

Model Region Limited access
gpt-5.1 East US2 & Sweden Central (Global Standard & DataZone Standard) Request access: Limited access model application. If you already have access to a limited access model no request is required.
gpt-5.1-chat East US2 & Sweden Central (Global Standard) No access request needed.
gpt-5.1-codex East US2 & Sweden Central (Global Standard) Request access: Limited access model application. If you already have access to a limited access model no request is required.
gpt-5.1-codex-mini East US2 & Sweden Central (Global Standard) No access request needed.
gpt-5-pro East US2 & Sweden Central (Global Standard) Request access: Limited access model application. If you already have access to a limited access model no request is required.
gpt-5-codex East US2 & Sweden Central (Global Standard) Request access: Limited access model application. If you already have access to a limited access model no request is required.
gpt-5 Model availability Request access: Limited access model application. If you already have access to a limited access model no request is required.
gpt-5-mini Model availability No access request needed.
gpt-5-nano Model availability No access request needed.
o3-pro East US2 & Sweden Central (Global Standard) Request access: Limited access model application. If you already have access to a limited access model no request is required.
codex-mini East US2 & Sweden Central (Global Standard) No access request needed.
o4-mini Model availability No access request needed to use the core capabilities of this model.

Request access: o4-mini reasoning summary feature
o3 Model availability Request access: Limited access model application
o3-mini Model availability. Access is no longer restricted for this model.
o1 Model availability. Access is no longer restricted for this model.
o1-mini Model availability. No access request needed for Global Standard deployments.

Standard (regional) deployments are currently only available to select customers who were previously granted access as part of the o1-preview release.

API & feature support

Feature gpt-5.1, 2025-11-13 gpt-5.1-chat, 2025-11-13 gpt-5.1-codex, 2025-11-13 gpt-5.1-codex-mini, 2025-11-13 gpt-5-pro, 2025-10-06 gpt-5-codex, 2025-09-011 gpt-5, 2025-08-07 gpt-5-mini, 2025-08-07 gpt-5-nano, 2025-08-07
API Version v1 v1 v1 v1 v1 v1 v1 v1 v1
Developer Messages
Structured Outputs
Context Window 400,000

Input: 272,000
Output: 128,000
128,000

Input: 111,616
Output: 16,384
400,000

Input: 272,000
Output: 128,000
400,000

Input: 272,000
Output: 128,000
400,000

Input: 272,000
Output: 128,000
400,000

Input: 272,000
Output: 128,000
400,000

Input: 272,000
Output: 128,000
400,000

Input: 272,000
Output: 128,000
400,000

Input: 272,000
Output: 128,000
Reasoning effort 4 5
Image input
Chat Completions API - - - -
Responses API
Functions/Tools
Parallel Tool Calls1 -
max_completion_tokens 2 - - - -
System Messages 3
Reasoning summary
Streaming -

1 Parallel tool calls are not supported when reasoning_effort is set to minimal

2 Reasoning models will only work with the max_completion_tokens parameter when using the Chat Completions API. Use max_output_tokens with the Responses API.

3 The latest reasoning models support system messages to make migration easier. You should not use both a developer message and a system message in the same API request.

4 gpt-5.1 reasoning_effort defaults to none. When upgrading from previous reasoning models to gpt-5.1 keep in mind that you may need to update your code to explicitly pass a reasoning_effort level if you want reasoning_effort to occur.

5 gpt-5-pro only supports reasoning_effort high, this is the default value even when not explicitly passed to the model.

NEW GPT-5 reasoning features

Feature Description
reasoning_effort minimal is now supported with GPT-5 series reasoning models. * none is only supported for gpt-5.1

Options: none, minimal, low, medium, high
verbosity A new parameter providing more granular control over how concise the model's output will be.

Options: low, medium, high.
preamble GPT-5 series reasoning models have the ability to spend extra time "thinking" before executing a function/tool call.

When this planning occurs the model can provide insight into the planning steps in the model response via a new object called the preamble object.

Generation of preambles in the model response is not guaranteed though you can encourage the model by using the instructions parameter and passing content like "You MUST plan extensively before each function call. ALWAYS output your plan to the user before calling any function"
allowed tools You can specify multiple tools under tool_choice instead of just one.
custom tool type Enables raw text (non-json) outputs
lark_tool Allows you to use some of the capabilities of Python lark for more flexible constraining of model responses

* gpt-5-codex does not support reasoning_effort minimal.

For more information, we also recommend reading OpenAI's GPT-5 prompting cookbook guide and their GPT-5 feature guide.

Note

  • To avoid timeouts background mode is recommended for o3-pro.
  • o3-pro does not currently support image generation.

Not Supported

The following are currently unsupported with reasoning models:

  • temperature, top_p, presence_penalty, frequency_penalty, logprobs, top_logprobs, logit_bias, max_tokens
  • The apply_patch and shell tools are currently not supported. Support for these tools with gpt-5.1 series models is coming soon.

Markdown output

By default the o3-mini and o1 models will not attempt to produce output that includes markdown formatting. A common use case where this behavior is undesirable is when you want the model to output code contained within a markdown code block. When the model generates output without markdown formatting you lose features like syntax highlighting, and copyable code blocks in interactive playground experiences. To override this new default behavior and encourage markdown inclusion in model responses, add the string Formatting re-enabled to the beginning of your developer message.

Adding Formatting re-enabled to the beginning of your developer message does not guarantee that the model will include markdown formatting in its response, it only increases the likelihood. We have found from internal testing that Formatting re-enabled is less effective by itself with the o1 model than with o3-mini.

To improve the performance of Formatting re-enabled you can further augment the beginning of the developer message which will often result in the desired output. Rather than just adding Formatting re-enabled to the beginning of your developer message, you can experiment with adding a more descriptive initial instruction like one of the examples below:

  • Formatting re-enabled - please enclose code blocks with appropriate markdown tags.
  • Formatting re-enabled - code output should be wrapped in markdown.

Depending on your expected output you may need to customize your initial developer message further to target your specific use case.