Azure OpenAI 推理模型旨在通过提高焦点和功能来解决推理和解决问题的任务。 与之前的迭代相比,这些模型花费更多的时间处理和理解用户的请求,使它们在科学、编码和数学等领域非常强大。
推理模型的主要功能:
- 复杂代码生成:能够生成算法并处理高级编码任务以支持开发人员。
- 高级问题解决:非常适合全面的集思广益会话和应对多方面挑战。
- 复杂的文档比较:非常适合用于分析合同、案例文件或法律文档,以确定细微的差异。
- 指令跟随和工作流管理:特别适合管理需要较短背景信息的工作流。
先决条件
已部署Azure OpenAI 推理模型。
如果使用 REST 示例:
安装Azure CLI。 有关详细信息,请参阅 install Azure CLI。
使用 az login 登录,然后生成一个持有者令牌,并将其存储在 AZURE_OPENAI_AUTH_TOKEN 环境变量中。
az account get-access-token --resource https://cognitiveservices.azure.com --query accessToken -o tsv
使用
这些模型 当前不支持与 使用聊天完成 API 的其他模型相同的参数集。
聊天完成 API
using Azure.Identity;
using OpenAI;
using OpenAI.Chat;
using System.ClientModel.Primitives;
#pragma warning disable OPENAI001 //currently required for token based authentication
BearerTokenPolicy tokenPolicy = new(
new DefaultAzureCredential(),
"https://ai.azure.com/.default");
ChatClient client = new(
model: "o4-mini",
authenticationPolicy: tokenPolicy,
options: new OpenAIClientOptions()
{
Endpoint = new Uri("https://YOUR-RESOURCE-NAME.openai.azure.com/openai/v1")
}
);
ChatCompletionOptions options = new ChatCompletionOptions
{
MaxOutputTokenCount = 100000
};
ChatCompletion completion = client.CompleteChat(
new DeveloperChatMessage("You are a helpful assistant"),
new UserChatMessage("Tell me about the bitter lesson")
);
Console.WriteLine($"[ASSISTANT]: {completion.Content[0].Text}");
Microsoft Entra ID:
如果不熟悉使用Microsoft Entra ID进行身份验证,请参阅 如何在具有Microsoft Entra ID身份验证的 Microsoft Foundry 模型中配置 Azure OpenAI。
from openai import OpenAI
from azure.identity import DefaultAzureCredential, get_bearer_token_provider
token_provider = get_bearer_token_provider(
DefaultAzureCredential(), "https://ai.azure.com/.default"
)
client = OpenAI(
base_url = "https://YOUR-RESOURCE-NAME.openai.azure.com/openai/v1/",
api_key=token_provider,
)
response = client.chat.completions.create(
model="YOUR-DEPLOYMENT-NAME", # replace with your model deployment name
messages=[
{"role": "user", "content": "What steps should I think about when writing my first Python API?"},
],
max_completion_tokens = 5000
)
print(response.model_dump_json(indent=2))
API 密钥:
import os
from openai import OpenAI
client = OpenAI(
api_key=os.getenv("AZURE_OPENAI_API_KEY"),
base_url="https://YOUR-RESOURCE-NAME.openai.azure.com/openai/v1/",
)
response = client.chat.completions.create(
model="YOUR-DEPLOYMENT-NAME", # replace with your model deployment name
messages=[
{"role": "user", "content": "What steps should I think about when writing my first Python API?"},
],
max_completion_tokens = 5000
)
print(response.model_dump_json(indent=2))
curl -X POST "https://YOUR-RESOURCE-NAME.openai.azure.com/openai/v1/chat/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $AZURE_OPENAI_AUTH_TOKEN" \
-d '{
"model": "gpt-5",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What steps should I think about when writing my first Python API?"}
],
"max_completion_tokens": 1000
}'
Python“聊天完成”API输出:
{
"id": "chatcmpl-AEj7pKFoiTqDPHuxOcirA9KIvf3yz",
"choices": [
{
"finish_reason": "stop",
"index": 0,
"logprobs": null,
"message": {
"content": "Writing your first Python API is an exciting step in developing software that can communicate with other applications. An API (Application Programming Interface) allows different software systems to interact with each other, enabling data exchange and functionality sharing. Here are the steps you should consider when creating your first Python API...truncated for brevity.",
"refusal": null,
"role": "assistant",
"function_call": null,
"tool_calls": null
},
"content_filter_results": {
"hate": {
"filtered": false,
"severity": "safe"
},
"protected_material_code": {
"filtered": false,
"detected": false
},
"protected_material_text": {
"filtered": false,
"detected": false
},
"self_harm": {
"filtered": false,
"severity": "safe"
},
"sexual": {
"filtered": false,
"severity": "safe"
},
"violence": {
"filtered": false,
"severity": "safe"
}
}
}
],
"created": 1728073417,
"model": "o1-2024-12-17",
"object": "chat.completion",
"service_tier": null,
"system_fingerprint": "fp_503a95a7d8",
"usage": {
"completion_tokens": 1843,
"prompt_tokens": 20,
"total_tokens": 1863,
"completion_tokens_details": {
"audio_tokens": null,
"reasoning_tokens": 448
},
"prompt_tokens_details": {
"audio_tokens": null,
"cached_tokens": 0
}
},
"prompt_filter_results": [
{
"prompt_index": 0,
"content_filter_results": {
"custom_blocklists": {
"filtered": false
},
"hate": {
"filtered": false,
"severity": "safe"
},
"jailbreak": {
"filtered": false,
"detected": false
},
"self_harm": {
"filtered": false,
"severity": "safe"
},
"sexual": {
"filtered": false,
"severity": "safe"
},
"violence": {
"filtered": false,
"severity": "safe"
}
}
}
]
}
推理工作
注意
推理模型 reasoning_tokens 包含在 completion_tokens_details 模型响应中。 这些是隐藏令牌,这些令牌不会作为消息响应内容的一部分返回,但模型使用它们来帮助生成请求的最终答案。
reasoning_effort 可以设置为 low、medium 或 high,适用于除 o1-mini 外的所有推理模型。 工作量设置越高,模型处理请求所需的时间越长,通常会导致生成更多的reasoning_tokens。
开发人员消息
开发人员消息("role": "developer")在功能上与系统消息相同。
将开发人员消息添加到前面的代码示例如下所示:
using Azure.Identity;
using OpenAI;
using OpenAI.Chat;
using System.ClientModel.Primitives;
#pragma warning disable OPENAI001 //currently required for token based authentication
BearerTokenPolicy tokenPolicy = new(
new DefaultAzureCredential(),
"https://ai.azure.com/.default");
ChatClient client = new(
model: "o4-mini",
authenticationPolicy: tokenPolicy,
options: new OpenAIClientOptions()
{
Endpoint = new Uri("https://YOUR-RESOURCE-NAME.openai.azure.com/openai/v1")
}
);
ChatCompletionOptions options = new ChatCompletionOptions
{
ReasoningEffortLevel = ChatReasoningEffortLevel.Low,
MaxOutputTokenCount = 100000
};
ChatCompletion completion = client.CompleteChat(
new DeveloperChatMessage("You are a helpful assistant"),
new UserChatMessage("Tell me about the bitter lesson")
);
Console.WriteLine($"[ASSISTANT]: {completion.Content[0].Text}");
Microsoft Entra ID:
如果不熟悉使用Microsoft Entra ID进行身份验证,请参阅 如何使用 Microsoft Entra ID 身份验证配置 Azure OpenAI。
from openai import OpenAI
from azure.identity import DefaultAzureCredential, get_bearer_token_provider
token_provider = get_bearer_token_provider(
DefaultAzureCredential(), "https://ai.azure.com/.default"
)
client = OpenAI(
base_url="https://YOUR-RESOURCE-NAME.openai.azure.com/openai/v1/",
api_key=token_provider,
)
response = client.chat.completions.create(
model="YOUR-DEPLOYMENT-NAME", # replace with your model deployment name
messages=[
{"role": "developer", "content": "You are a helpful assistant."},
{"role": "user", "content": "What steps should I think about when writing my first Python API?"},
],
max_completion_tokens=5000,
reasoning_effort="medium", # low, medium, or high
)
print(response.model_dump_json(indent=2))
API 密钥:
import os
from openai import OpenAI
client = OpenAI(
api_key=os.getenv("AZURE_OPENAI_API_KEY"),
base_url="https://YOUR-RESOURCE-NAME.openai.azure.com/openai/v1/",
)
response = client.chat.completions.create(
model="gpt-5-mini", # replace with the model deployment name of your o1 deployment.
messages=[
{"role": "developer","content": "You are a helpful assistant."}, # optional equivalent to a system message for reasoning models
{"role": "user", "content": "What steps should I think about when writing my first Python API?"},
],
max_completion_tokens = 5000,
reasoning_effort = "medium" # low, medium, or high
)
print(response.model_dump_json(indent=2))
curl -X POST "https://YOUR-RESOURCE-NAME.openai.azure.com/openai/v1/chat/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $AZURE_OPENAI_AUTH_TOKEN" \
-d '{
"model": "gpt-5",
"messages": [
{"role": "developer", "content": "You are a helpful assistant."},
{"role": "user", "content": "What steps should I think about when writing my first Python API?"}
],
"max_completion_tokens": 1000,
"reasoning_effort": "medium"
}'
Python聊天完成 API 输出:
{
"id": "chatcmpl-CaODNsQOHoRLcb9JVSKYY1e2Iss5s",
"choices": [
{
"finish_reason": "stop",
"index": 0,
"logprobs": null,
"message": {
"content": "Here’s a practical, beginner‑friendly checklist to guide you through writing your first Python API, from idea to production.\n\n1) Clarify goals and constraints\n- Who will use it (internal team, public), what problems it solves, expected traffic, latency requirements.\n- Resources you’ll expose (users, orders, etc.) and core operations.\n- Non‑functional needs: security, compliance, uptime, scalability.\n\n2) Choose your API style\n- REST (most common for CRUD and simple integrations).\n- GraphQL (flexible queries, more complex to secure/monitor).\n- gRPC (high‑performance, strongly typed, good for service‑to‑service).\n- For a first API, REST + JSON is usually best.\n\n3) Design the contract first\n- Draft an OpenAPI/Swagger spec: endpoints, request/response schemas, status codes, error model.\n- Decide naming conventions, pagination, filtering, sorting.\n- Define consistent time/date format (ISO‑8601, UTC), ID format, and field casing.\n- Plan versioning strategy (e.g., /v1) and deprecation policy.\n\n4) Plan security and auth\n- Pick auth: API keys for simple internal use; OAuth2/JWT for user auth; mTLS for service‑to‑service.\n- CORS policy for browsers; HTTPS everywhere; security headers.\n- Validate all inputs; avoid leaking stack traces; define rate limits and quotas.\n\n5) Pick your Python stack\n- Frameworks: FastAPI (great typing, validation, auto docs), Flask (minimal), Django REST Framework (batteries included).\n- ASGI/WSGI server: Uvicorn or Gunicorn.\n- Data layer: PostgreSQL + SQLAlchemy/Django ORM; migrations with Alembic/Django migrations.\n- Caching: Redis (optional).\n- Background jobs: Celery/RQ (if needed).\n\n6) Set up the project\n- Create a virtual environment; choose dependency management (pip, Poetry).\n- Establish project structure (app, api, models, services, tests).\n- Add linting/formatting/type checks: black, isort, flake8, mypy; pre‑commit hooks.\n- Configuration via environment variables; secrets via a manager (not in code).\n\n7) Implement core functionality\n- Build endpoints that match your spec; keep business logic in a service layer, not in route handlers.\n- Schema validation (Pydantic with FastAPI, Marshmallow for Flask).\n- Consistent responses and errors; use clear status codes (201 create, 204 no content, 400/404/409/422, 500).\n- Pagination and filtering; idempotency for certain POST operations; ETags/conditional requests if useful.\n\n8) Error handling and an error model\n- Define a standard error body (code, message, details, correlation_id).\n- Log errors with context; don’t expose internal details to clients.\n\n9) Testing strategy\n- Unit tests for services/validators.\n- Integration tests for endpoints (pytest + httpx/requests) with a test database.\n- Contract tests to assert the API matches the OpenAPI spec.\n- Mock external services; measure coverage and focus on critical paths.\n\n10) Documentation and developer experience\n- Auto‑generated docs (FastAPI provides Swagger/ReDoc).\n- Write examples for each endpoint; onboarding and usage notes.\n- Keep a changelog and release notes.\n\n11) Observability and reliability\n- Structured logging (JSON), include request IDs/correlation IDs.\n- Metrics (requests, latency, error rates), health/readiness endpoints.\n- Tracing (OpenTelemetry) if you have multiple services.\n- Error reporting (Sentry or similar).\n\n12) Deployment and operations\n- Containerize with Docker; follow 12‑factor app principles.\n- CI/CD pipeline: run tests, build image, deploy, run migrations.\n- Choose hosting (Render, Fly.io, Railway, Heroku, AWS/GCP/Azure).\n- Configure scaling, connection pools, and timeouts; use a reverse proxy if needed.\n\n13) Performance and data concerns\n- Index your database; avoid N+1 queries; use connection pooling.\n- Load test key endpoints; profile hotspots.\n- Caching strategies where appropriate; consider async I/O for high‑concurrency workloads.\n\n14) Versioning and lifecycle management\n- Keep backward compatibility for minor changes; add fields rather than changing semantics.\n- Communicate deprecations; sunset old versions with a timeline.\n\n15) Governance, compliance, and safety\n- Handle PII correctly; data retention and audit logs if required.\n- Least‑privilege DB access; rotate secrets; review third‑party dependencies.\n\nBeginner‑friendly defaults\n- FastAPI + Pydantic + Uvicorn\n- PostgreSQL + SQLAlchemy + Alembic\n- pytest + httpx + coverage\n- black, isort, flake8, mypy, pre‑commit\n- Docker + simple CI (GitHub Actions) + a managed host\n\nCommon pitfalls to avoid\n- Inconsistent status codes or error formats.\n- Weak input validation and missing authentication.\n- Business logic inside route handlers (hard to test/maintain).\n- No migrations or tests; no logging/metrics.\n- Ignoring pagination and timezones; returning unbounded lists.\n\nIf you share whether it’s public vs internal, expected traffic, and preferred framework, I can tailor this to a concrete starter plan and recommended tools.",
"refusal": null,
"role": "assistant",
"annotations": [],
"audio": null,
"function_call": null,
"tool_calls": null
},
"content_filter_results": {
"hate": {
"filtered": false,
"severity": "safe"
},
"protected_material_code": {
"filtered": false,
"detected": false
},
"protected_material_text": {
"filtered": false,
"detected": false
},
"self_harm": {
"filtered": false,
"severity": "safe"
},
"sexual": {
"filtered": false,
"severity": "safe"
},
"violence": {
"filtered": false,
"severity": "safe"
}
}
}
],
"created": 1762788925,
"model": "gpt-5-2025-08-07",
"object": "chat.completion",
"service_tier": null,
"system_fingerprint": null,
"usage": {
"completion_tokens": 2919,
"prompt_tokens": 29,
"total_tokens": 2948,
"completion_tokens_details": {
"accepted_prediction_tokens": 0,
"audio_tokens": 0,
"reasoning_tokens": 1792,
"rejected_prediction_tokens": 0
},
"prompt_tokens_details": {
"audio_tokens": 0,
"cached_tokens": 0
}
},
"prompt_filter_results": [
{
"prompt_index": 0,
"content_filter_results": {
"hate": {
"filtered": false,
"severity": "safe"
},
"jailbreak": {
"filtered": false,
"detected": false
},
"self_harm": {
"filtered": false,
"severity": "safe"
},
"sexual": {
"filtered": false,
"severity": "safe"
},
"violence": {
"filtered": false,
"severity": "safe"
}
}
}
]
}
推理摘要
将最新的推理模型与 响应 API 配合使用时,可以使用推理摘要参数来接收模型的思维推理链摘要。
重要
试图通过推理摘要参数以外的方法来提取原始推理是不被支持的,这可能会违反可接受使用政策,并在被检测到时导致限制或挂起。
using OpenAI;
using OpenAI.Responses;
using System.ClientModel.Primitives;
using Azure.Identity;
#pragma warning disable OPENAI001 //currently required for token based authentication
BearerTokenPolicy tokenPolicy = new(
new DefaultAzureCredential(),
"https://ai.azure.com/.default");
OpenAIResponseClient client = new(
model: "o4-mini",
authenticationPolicy: tokenPolicy,
options: new OpenAIClientOptions()
{
Endpoint = new Uri("https://YOUR-RESOURCE-NAME.openai.azure.com/openai/v1")
}
);
OpenAIResponse response = await client.CreateResponseAsync(
userInputText: "What's the optimal strategy to win at poker?",
new ResponseCreationOptions()
{
ReasoningOptions = new ResponseReasoningOptions()
{
ReasoningEffortLevel = ResponseReasoningEffortLevel.High,
ReasoningSummaryVerbosity = ResponseReasoningSummaryVerbosity.Auto,
},
});
// Get the reasoning summary from the first OutputItem (ReasoningResponseItem)
Console.WriteLine("=== Reasoning Summary ===");
foreach (var item in response.OutputItems)
{
if (item is ReasoningResponseItem reasoningItem)
{
foreach (var summaryPart in reasoningItem.SummaryParts)
{
if (summaryPart is ReasoningSummaryTextPart textPart)
{
Console.WriteLine(textPart.Text);
}
}
}
}
Console.WriteLine("\n=== Assistant Response ===");
// Get the assistant's output
Console.WriteLine(response.GetOutputText());
需要升级 OpenAI 客户端库才能访问最新参数。
pip install openai --upgrade
Microsoft Entra ID:
from openai import OpenAI
from azure.identity import DefaultAzureCredential, get_bearer_token_provider
token_provider = get_bearer_token_provider(
DefaultAzureCredential(), "https://ai.azure.com/.default"
)
client = OpenAI(
base_url = "https://YOUR-RESOURCE-NAME.openai.azure.com/openai/v1/",
api_key=token_provider,
)
response = client.responses.create(
input="Tell me about the curious case of neural text degeneration",
model="gpt-5", # replace with model deployment name
reasoning={
"effort": "medium",
"summary": "auto" # auto, concise, or detailed, gpt-5 series do not support concise
},
text={
"verbosity": "low" # New with GPT-5 models
}
)
print(response.model_dump_json(indent=2))
API 密钥:
import os
from openai import OpenAI
client = OpenAI(
base_url = "https://YOUR-RESOURCE-NAME.openai.azure.com/openai/v1/",
api_key=os.getenv("AZURE_OPENAI_API_KEY")
)
response = client.responses.create(
input="Tell me about the curious case of neural text degeneration",
model="gpt-5", # replace with model deployment name
reasoning={
"effort": "medium",
"summary": "auto" # auto, concise, or detailed, gpt-5 series do not support concise
},
text={
"verbosity": "low" # New with GPT-5 models
}
)
print(response.model_dump_json(indent=2))
curl -X POST "https://YOUR-RESOURCE-NAME.openai.azure.com/openai/v1/responses" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $AZURE_OPENAI_AUTH_TOKEN" \
-d '{
"model": "gpt-5",
"input": "Tell me about the curious case of neural text degeneration",
"reasoning": {"summary": "auto"},
"text": {"verbosity": "low"}
}'
{
"id": "resp_689a0a3090808190b418acf12b5cc40e0fc1c31bc69d8719",
"created_at": 1754925616.0,
"error": null,
"incomplete_details": null,
"instructions": null,
"metadata": {},
"model": "gpt-5",
"object": "response",
"output": [
{
"id": "rs_689a0a329298819095d90c34dc9b80db0fc1c31bc69d8719",
"summary": [],
"type": "reasoning",
"encrypted_content": null,
"status": null
},
{
"id": "msg_689a0a33009881909fe0fcf57cba30200fc1c31bc69d8719",
"content": [
{
"annotations": [],
"text": "Neural text degeneration refers to the ways language models produce low-quality, repetitive, or vacuous text, especially when generating long outputs. It’s “curious” because models trained to imitate fluent text can still spiral into unnatural patterns. Key aspects:\n\n- Repetition and loops: The model repeats phrases or sentences (“I’m sorry, but...”), often due to high-confidence tokens reinforcing themselves.\n- Loss of specificity: Vague, generic, agreeable text that avoids concrete details.\n- Drift and contradiction: The output gradually departs from context or contradicts itself over long spans.\n- Exposure bias: During training, models see gold-standard prefixes; at inference, they must condition on their own imperfect outputs, compounding errors.\n- Likelihood vs. quality mismatch: Maximizing token-level likelihood doesn’t align with human preferences for diversity, coherence, or factuality.\n- Token over-optimization: Frequent, safe tokens get overused; certain phrases become attractors.\n- Entropy collapse: With greedy or low-temperature decoding, the distribution narrows too much, causing repetitive, low-entropy text.\n- Length and beam search issues: Larger beams or long generations can favor bland, repetitive sequences (the “likelihood trap”).\n\nCommon mitigations:\n\n- Decoding strategies:\n - Top-k, nucleus (top-p), or temperature sampling to keep sufficient entropy.\n - Typical sampling and locally typical sampling to avoid dull but high-probability tokens.\n - Repetition penalties, presence/frequency penalties, no-repeat n-grams.\n - Contrastive decoding (and variants like DoLa) to filter generic continuations.\n - Min/max length, stop sequences, and beam search with diversity/penalties.\n\n- Training and alignment:\n - RLHF/DPO to better match human preferences for non-repetitive, helpful text.\n - Supervised fine-tuning on high-quality, diverse data; instruction tuning.\n - Debiasing objectives (unlikelihood training) to penalize repetition and banned patterns.\n - Mixture-of-denoisers or latent planning to improve long-range coherence.\n\n- Architectural and planning aids:\n - Retrieval-augmented generation to ground outputs.\n - Tool use and structured prompting to constrain drift.\n - Memory and planning modules, hierarchical decoding, or sentence-level control.\n\n- Prompting tips:\n - Ask for concise answers, set token limits, and specify structure.\n - Provide concrete constraints or content to reduce generic filler.\n - Use “say nothing if uncertain” style instructions to avoid vacuity.\n\nRepresentative papers/terms to search:\n- Holtzman et al., “The Curious Case of Neural Text Degeneration” (2020): nucleus sampling.\n- Welleck et al., “Neural Text Degeneration with Unlikelihood Training.”\n- Li et al., “A Contrastive Framework for Decoding.”\n- Su et al., “DoLa: Decoding by Contrasting Layers.”\n- Meister et al., “Typical Decoding.”\n- Ouyang et al., “Training language models to follow instructions with human feedback.”\n\nIn short, degeneration arises from a mismatch between next-token likelihood and human preferences plus decoding choices; careful decoding, training objectives, and grounding help prevent it.",
"type": "output_text",
"logprobs": null
}
],
"role": "assistant",
"status": "completed",
"type": "message"
}
],
"parallel_tool_calls": true,
"temperature": 1.0,
"tool_choice": "auto",
"tools": [],
"top_p": 1.0,
"background": false,
"max_output_tokens": null,
"max_tool_calls": null,
"previous_response_id": null,
"prompt": null,
"prompt_cache_key": null,
"reasoning": {
"effort": "minimal",
"generate_summary": null,
"summary": "detailed"
},
"safety_identifier": null,
"service_tier": "default",
"status": "completed",
"text": {
"format": {
"type": "text"
}
},
"top_logprobs": null,
"truncation": "disabled",
"usage": {
"input_tokens": 16,
"input_tokens_details": {
"cached_tokens": 0
},
"output_tokens": 657,
"output_tokens_details": {
"reasoning_tokens": 0
},
"total_tokens": 673
},
"user": null,
"content_filters": null,
"store": true
}
注意
即使启用,也不能保证为每个步骤/请求生成推理摘要。 这是预期行为。
Python lark
GPT-5 系列推理模型能够调用新的 custom_tool,这个叫做 lark_tool。 此工具基于 Python lark,可用于更灵活的模型输出约束。
响应 API
{
"model": "gpt-5-2025-08-07",
"input": "please calculate the area of a circle with radius equal to the number of 'r's in strawberry",
"tools": [
{
"type": "custom",
"name": "lark_tool",
"format": {
"type": "grammar",
"syntax": "lark",
"definition": "start: QUESTION NEWLINE ANSWER\nQUESTION: /[^\\n?]{1,200}\\?/\nNEWLINE: /\\n/\nANSWER: /[^\\n!]{1,200}!/"
}
}
],
"tool_choice": "required"
}
Microsoft Entra ID:
from openai import OpenAI
from azure.identity import DefaultAzureCredential, get_bearer_token_provider
token_provider = get_bearer_token_provider(
DefaultAzureCredential(), "https://ai.azure.com/.default"
)
client = OpenAI(
base_url = "https://YOUR-RESOURCE-NAME.openai.azure.com/openai/v1/",
api_key=token_provider,
)
response = client.responses.create(
model="gpt-5", # replace with your model deployment name
tools=[
{
"type": "custom",
"name": "lark_tool",
"format": {
"type": "grammar",
"syntax": "lark",
"definition": "start: QUESTION NEWLINE ANSWER\nQUESTION: /[^\\n?]{1,200}\\?/\nNEWLINE: /\\n/\nANSWER: /[^\\n!]{1,200}!/"
}
}
],
input=[{"role": "user", "content": "Please calculate the area of a circle with radius equal to the number of 'r's in strawberry"}],
)
print(response.model_dump_json(indent=2))
API 密钥:
import os
from openai import OpenAI
client = OpenAI(
base_url = "https://YOUR-RESOURCE-NAME.openai.azure.com/openai/v1/",
api_key=os.getenv("AZURE_OPENAI_API_KEY")
)
response = client.responses.create(
model="gpt-5", # replace with your model deployment name
tools=[
{
"type": "custom",
"name": "lark_tool",
"format": {
"type": "grammar",
"syntax": "lark",
"definition": "start: QUESTION NEWLINE ANSWER\nQUESTION: /[^\\n?]{1,200}\\?/\nNEWLINE: /\\n/\nANSWER: /[^\\n!]{1,200}!/"
}
}
],
input=[{"role": "user", "content": "Please calculate the area of a circle with radius equal to the number of 'r's in strawberry"}],
)
print(response.model_dump_json(indent=2))
输出:
{
"id": "resp_689a0cf927408190b8875915747667ad01c936c6ffb9d0d3",
"created_at": 1754926332.0,
"error": null,
"incomplete_details": null,
"instructions": null,
"metadata": {},
"model": "gpt-5",
"object": "response",
"output": [
{
"id": "rs_689a0cfd1c888190a2a67057f471b5cc01c936c6ffb9d0d3",
"summary": [],
"type": "reasoning",
"encrypted_content": null,
"status": null
},
{
"id": "msg_689a0d00e60c81908964e5e9b2d6eeb501c936c6ffb9d0d3",
"content": [
{
"annotations": [],
"text": "“strawberry” has 3 r’s, so the radius is 3.\nArea = πr² = π × 3² = 9π ≈ 28.27 square units.",
"type": "output_text",
"logprobs": null
}
],
"role": "assistant",
"status": "completed",
"type": "message"
}
],
"parallel_tool_calls": true,
"temperature": 1.0,
"tool_choice": "auto",
"tools": [
{
"name": "lark_tool",
"parameters": null,
"strict": null,
"type": "custom",
"description": null,
"format": {
"type": "grammar",
"definition": "start: QUESTION NEWLINE ANSWER\nQUESTION: /[^\\n?]{1,200}\\?/\nNEWLINE: /\\n/\nANSWER: /[^\\n!]{1,200}!/",
"syntax": "lark"
}
}
],
"top_p": 1.0,
"background": false,
"max_output_tokens": null,
"max_tool_calls": null,
"previous_response_id": null,
"prompt": null,
"prompt_cache_key": null,
"reasoning": {
"effort": "medium",
"generate_summary": null,
"summary": null
},
"safety_identifier": null,
"service_tier": "default",
"status": "completed",
"text": {
"format": {
"type": "text"
}
},
"top_logprobs": null,
"truncation": "disabled",
"usage": {
"input_tokens": 139,
"input_tokens_details": {
"cached_tokens": 0
},
"output_tokens": 240,
"output_tokens_details": {
"reasoning_tokens": 192
},
"total_tokens": 379
},
"user": null,
"content_filters": null,
"store": true
}
聊天完成
{
"messages": [
{
"role": "user",
"content": "Which one is larger, 42 or 0?"
}
],
"tools": [
{
"type": "custom",
"name": "custom_tool",
"custom": {
"name": "lark_tool",
"format": {
"type": "grammar",
"grammar": {
"syntax": "lark",
"definition": "start: QUESTION NEWLINE ANSWER\nQUESTION: /[^\\n?]{1,200}\\?/\nNEWLINE: /\\n/\nANSWER: /[^\\n!]{1,200}!/"
}
}
}
}
],
"tool_choice": "required",
"model": "gpt-5-2025-08-07"
}
可用 性
区域可用性
API 和功能支持
|
特征 |
gpt-5.5, 2026-04-24 |
gpt-5.4-nano, 2026-03-17 |
gpt-5.4-mini, 2026-03-17 |
gpt-5.4-pro |
gpt-5.4, 2026-03-05 |
gpt-5.3-codex, 2026-02-24 |
gpt-5.2-codex, 2026-01-14 |
gpt-5.2, 2025-12-11 |
gpt-5.1-codex-max, 2025-12-04 |
gpt-5.1, 2025-11-13 |
gpt-5.1-chat, 2025-11-13 |
gpt-5.1-codex, 2025-11-13 |
gpt-5.1-codex-mini, 2025-11-13 |
gpt-5-pro, 2025-10-06 |
gpt-5-codex, 2025-09-011 |
gpt-5, 2025-08-07 |
gpt-5-mini, 2025-08-07 |
gpt-5-nano, 2025-08-07 |
|
开发人员消息 |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
|
结构化输出 |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
|
|
上下文窗口 |
1,050,000
输入: 922,000 输出: 128,000 |
400,000
输入:272,000 输出:128,000
|
400,000
输入:272,000 输出:128,000
|
1,050,000
输入: 922,000 输出: 128,000 |
1,050,000
输入: 922,000 输出: 128,000 |
400,000
输入:272,000 输出:128,000 |
400,000
输入:272,000 输出:128,000 |
400,000
输入:272,000 输出:128,000 |
400,000
输入:272,000 输出:128,000 |
400,000
输入:272,000 输出:128,000 |
128,000
输入:111,616 输出:16,384 |
400,000
输入:272,000 输出:128,000 |
400,000
输入:272,000 输出:128,000 |
400,000
输入:272,000 输出:128,000 |
400,000
输入:272000 输出:128,000 |
400,000
输入:272000 输出:128,000 |
400,000
输入:272,000 输出:128,000 |
400,000
输入:272,000 输出:128,000 |
|
推理工作7 |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅
6 |
✅
4 |
✅ |
✅ |
✅ |
✅
5 |
✅ |
✅ |
✅ |
✅ |
|
图像输入 |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
| 聊天完成 API |
✅ |
✅ |
✅ |
- |
✅ |
- |
- |
✅ |
- |
✅ |
✅ |
- |
- |
- |
- |
✅ |
✅ |
✅ |
| 响应 API |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
|
| 功能/工具 |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
|
| 并行工具调用1 |
✅ |
✅ |
✅ |
- |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
- |
✅ |
✅ |
✅ |
✅ |
max_completion_tokens
2 |
✅ |
✅ |
✅ |
- |
✅ |
- |
- |
✅ |
- |
✅ |
✅ |
- |
- |
- |
- |
✅ |
✅ |
✅ |
| 系统消息 3 |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
|
推理摘要 |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
| 流媒体 |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
- |
✅ |
✅ |
✅ |
✅ |
1 并行工具调用在 reasoning_effort 设置为 minimal 时不被支持
2 推理模型仅在使用聊天完成 API 时使用 max_completion_tokens 参数。 与响应 API 一起使用 max_output_tokens 。
3 最新的推理模型支持系统消息,以便更轻松地迁移。 不应在同一 API 请求中使用开发人员消息和系统消息。
4gpt-5.1reasoning_effort 默认为 none. 从以前的推理模型升级到gpt-5.1时,请记住,如果希望推理努力(reasoning_effort)进行,则可能需要更新代码以明确传递推理努力级别。
5gpt-5-pro 仅支持 reasoning_efforthigh,即使未显式传递给模型,这也是默认值。
6gpt-5.1-codex-max 添加了对新 reasoning_effort 级别的支持,这是推理工作可以设置为的最高级别 xhigh 。
7gpt-5.2、、gpt-5.1gpt-5.1-codex、gpt-5.1-codex-max支持gpt-5.1-codex-mini'None'作为参数的值reasoning_effort。 如果要使用这些模型生成响应而不进行推理,请设置 reasoning_effort='None'。 此设置可以提高速度。
新的 GPT-5 推理功能
| 功能 |
描述 |
reasoning_effort |
xhigh 仅在与 gpt-5.1-codex-max 一起使用时支持。
minimal 仅支持原始 GPT-5 推理模型。
minimal 不支持 gpt-5.1 或更高的 *
选项:none、、minimal、lowmedium、high、xhigh |
verbosity |
提供更细致地控制模型输出简洁程度的新参数。
选项:low、、mediumhigh. |
preamble |
GPT-5 系列推理模型在执行函数/工具调用之前,能够花费额外的时间“思考”。
在进行此规划时,模型可以通过一个名为 preamble 的新对象,在模型响应中提供有关规划步骤的洞察。
虽然您可以通过使用instructions参数并传递类似于“每次函数调用之前必须广泛规划”这样的内容来鼓励模型,但不能保证模型响应中会生成引导语。 始终向用户输出计划,然后再调用任何函数” |
|
允许的工具 |
可以在下面 tool_choice 指定多个工具,而不是只指定一个工具。 |
|
自定义工具类型 |
启用原始文本(非 json)输出 |
lark_tool |
允许你使用 Python lark 的一些功能来更灵活地约束模型响应 |
*
gpt-5-codex 也不支持 reasoning_effortminimal。
有关详细信息,我们还建议阅读 OpenAI 的 GPT-5 提示指南 及其 GPT-5 功能指南。
|
特征 |
codex-mini, 2025-05-16 |
o3-pro, 2025-06-10 |
o4-mini, 2025-04-16 |
o3, 2025-04-16 |
o3-mini, 2025-01-31 |
o1, 2024-12-17 |
|
开发人员消息 |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
|
结构化输出 |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
|
上下文窗口 |
输入:200 000 输出:100,000 |
输入:200,000 输出:100,000 |
输入:200000 输出:100,000 |
输入:200,000 输出:100,000 |
输入:200000 输出:100,000 |
输入:200000 输出:100000 |
|
推理工作 |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
|
图像输入 |
✅ |
✅ |
✅ |
✅ |
- |
✅ |
| 聊天完成 API |
- |
- |
✅ |
✅ |
✅ |
✅ |
| 响应 API |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
| 功能/工具 |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
| 并行工具调用 |
- |
- |
- |
- |
- |
- |
max_completion_tokens
1 |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
| 系统消息 2 |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
|
推理摘要 |
✅ |
- |
✅ |
✅ |
- |
- |
| 流媒体 3 |
✅ |
- |
✅ |
✅ |
✅ |
- |
1 推理模型仅在使用聊天完成 API 时使用 max_completion_tokens 参数。 与响应 API 一起使用 max_output_tokens 。
2 最新的 o* 系列模型支持系统消息,以便更轻松地迁移。 使用系统消息与 o4-mini、o3、o3-mini 和 o1 时,它将被视为开发者消息。 不应在同一 API 请求中使用开发人员消息和系统消息。
3o3流媒体仅限有限访问。
注意
- 为避免超时,建议使用
o3-pro。
-
o3-pro 当前不支持映像生成。
不支持
推理模型当前不支持以下各项:
-
temperature、、top_ppresence_penalty、frequency_penalty、logprobs、top_logprobs、、 logit_biasmax_tokens
Markdown 输出
默认情况下, o3-mini 模型 o1 不会尝试生成包含 markdown 格式的输出。 当希望模型输出 markdown 代码块中包含的代码时,此行为是不受欢迎的常见用例。 当模型生成不带 markdown 格式的输出时,在交互体验中会丢失语法高亮和可复制代码块等功能。 若要重写此新的默认行为,并鼓励在模型响应中包含 Markdown,请将字符串 Formatting re-enabled 添加到开发人员消息的开头。
添加到 Formatting re-enabled 开发人员消息的开头不能保证模型在其响应中包含 markdown 格式,只会增加可能性。 我们从内部测试中发现,单独使用Formatting re-enabled搭配o1模型的效果相比搭配o3-mini模型要差。
为了提高 Formatting re-enabled 的性能,您可以在开发人员消息的开头进行进一步的补充,这样通常会得到所需的输出结果。 可以尝试在您的开发人员消息开头添加更具描述性的初始指令,而不是仅仅添加Formatting re-enabled,以下是一些示例:
Formatting re-enabled - please enclose code blocks with appropriate markdown tags.
Formatting re-enabled - code output should be wrapped in markdown.
根据预期输出,可能需要进一步自定义初始开发人员消息,以针对特定用例。