Evenementer
Mar 17, 9 PM - Mar 21, 10 AM
Maacht mat bei der Meetup-Serie, fir skaléierbar KI-Léisungen op Basis vu realistesche Benotzungsfäll mat aneren Entwéckler an Experten ze bauen.
Elo umellenDëse Browser gëtt net méi ënnerstëtzt.
Upgrat op Microsoft Edge fir vun de Virdeeler vun leschten Eegeschaften, Sécherheetsupdaten, an techneschem Support ze profitéieren.
Azure OpenAI o-series
models are designed to tackle reasoning and problem-solving tasks with increased focus and capability. These models spend more time processing and understanding the user's request, making them exceptionally strong in areas like science, coding, and math compared to previous iterations.
Key capabilities of the o-series models:
For access to o3-mini
, o1
, and o1-preview
, registration is required, and access will be granted based on Microsoft's eligibility criteria.
Customers who previously applied and received access to o1
or o1-preview
, don't need to reapply as they are automatically on the wait-list for the latest model.
Request access: limited access model application
Model | Region | Limited access |
---|---|---|
o3-mini |
Model availability. | Limited access model application |
o1 |
Model availability. | Limited access model application |
o1-preview |
Model availability. | This model is only available for customers who were granted access as part of the original limited access release. We're currently not expanding access to o1-preview . |
o1-mini |
Model availability. | No access request needed for Global Standard deployments. Standard (regional) deployments are currently only available to select customers who were previously granted access as part of the o1-preview release. |
Feature | o3-mini, 2025-01-31 | o1, 2024-12-17 | o1-preview, 2024-09-12 | o1-mini, 2024-09-12 |
---|---|---|---|---|
API Version | 2024-12-01-preview 2025-01-01-preview |
2024-12-01-preview 2025-01-01-preview |
2024-09-01-preview 2024-10-01-preview 2024-12-01-preview |
2024-09-01-preview 2024-10-01-preview 2024-12-01-preview |
Developer Messages | ✅ | ✅ | - | - |
Structured Outputs | ✅ | ✅ | - | - |
Context Window | Input: 200,000 Output: 100,000 |
Input: 200,000 Output: 100,000 |
Input: 128,000 Output: 32,768 |
Input: 128,000 Output: 65,536 |
Reasoning effort | ✅ | ✅ | - | - |
Vision Support | - | ✅ | - | - |
Functions/Tools | ✅ | ✅ | - | - |
max_completion_tokens * |
✅ | ✅ | ✅ | ✅ |
System Messages** | ✅ | ✅ | - | - |
Streaming | ✅ | - | - | - |
* Reasoning models will only work with the max_completion_tokens
parameter.
**The latest o* series model support system messages to make migration easier. When you use a system message with o3-mini
and o1
it will be treated as a developer message. You should not use both a developer message and a system message in the same API request.
The following are currently unsupported with reasoning models:
temperature
, top_p
, presence_penalty
, frequency_penalty
, logprobs
, top_logprobs
, logit_bias
, max_tokens
These models don't currently support the same set of parameters as other models that use the chat completions API.
You'll need to upgrade your OpenAI client library for access to the latest parameters.
pip install openai --upgrade
If you're new to using Microsoft Entra ID for authentication see How to configure Azure OpenAI Service with Microsoft Entra ID authentication.
from openai import AzureOpenAI
from azure.identity import DefaultAzureCredential, get_bearer_token_provider
token_provider = get_bearer_token_provider(
DefaultAzureCredential(), "https://cognitiveservices.azure.com/.default"
)
client = AzureOpenAI(
azure_endpoint = os.getenv("AZURE_OPENAI_ENDPOINT"),
azure_ad_token_provider=token_provider,
api_version="2024-12-01-preview"
)
response = client.chat.completions.create(
model="o1-new", # replace with the model deployment name of your o1-preview, or o1-mini model
messages=[
{"role": "user", "content": "What steps should I think about when writing my first Python API?"},
],
max_completion_tokens = 5000
)
print(response.model_dump_json(indent=2))
Output:
{
"id": "chatcmpl-AEj7pKFoiTqDPHuxOcirA9KIvf3yz",
"choices": [
{
"finish_reason": "stop",
"index": 0,
"logprobs": null,
"message": {
"content": "Writing your first Python API is an exciting step in developing software that can communicate with other applications. An API (Application Programming Interface) allows different software systems to interact with each other, enabling data exchange and functionality sharing. Here are the steps you should consider when creating your first Python API...truncated for brevity.",
"refusal": null,
"role": "assistant",
"function_call": null,
"tool_calls": null
},
"content_filter_results": {
"hate": {
"filtered": false,
"severity": "safe"
},
"protected_material_code": {
"filtered": false,
"detected": false
},
"protected_material_text": {
"filtered": false,
"detected": false
},
"self_harm": {
"filtered": false,
"severity": "safe"
},
"sexual": {
"filtered": false,
"severity": "safe"
},
"violence": {
"filtered": false,
"severity": "safe"
}
}
}
],
"created": 1728073417,
"model": "o1-2024-12-17",
"object": "chat.completion",
"service_tier": null,
"system_fingerprint": "fp_503a95a7d8",
"usage": {
"completion_tokens": 1843,
"prompt_tokens": 20,
"total_tokens": 1863,
"completion_tokens_details": {
"audio_tokens": null,
"reasoning_tokens": 448
},
"prompt_tokens_details": {
"audio_tokens": null,
"cached_tokens": 0
}
},
"prompt_filter_results": [
{
"prompt_index": 0,
"content_filter_results": {
"custom_blocklists": {
"filtered": false
},
"hate": {
"filtered": false,
"severity": "safe"
},
"jailbreak": {
"filtered": false,
"detected": false
},
"self_harm": {
"filtered": false,
"severity": "safe"
},
"sexual": {
"filtered": false,
"severity": "safe"
},
"violence": {
"filtered": false,
"severity": "safe"
}
}
}
]
}
Notiz
Reasoning models have reasoning_tokens
as part of completion_tokens_details
in the model response. These are hidden tokens that aren't returned as part of the message response content but are used by the model to help generate a final answer to your request. 2024-12-01-preview
adds an additional new parameter reasoning_effort
which can be set to low
, medium
, or high
with the latest o1
model. The higher the effort setting, the longer the model will spend processing the request, which will generally result in a larger number of reasoning_tokens
.
Functionally developer messages "role": "developer"
are the same as system messages.
Adding a developer message to the previous code example would look as follows:
You'll need to upgrade your OpenAI client library for access to the latest parameters.
pip install openai --upgrade
If you're new to using Microsoft Entra ID for authentication see How to configure Azure OpenAI Service with Microsoft Entra ID authentication.
from openai import AzureOpenAI
from azure.identity import DefaultAzureCredential, get_bearer_token_provider
token_provider = get_bearer_token_provider(
DefaultAzureCredential(), "https://cognitiveservices.azure.com/.default"
)
client = AzureOpenAI(
azure_endpoint = os.getenv("AZURE_OPENAI_ENDPOINT"),
azure_ad_token_provider=token_provider,
api_version="2024-12-01-preview"
)
response = client.chat.completions.create(
model="o1-new", # replace with the model deployment name of your o1-preview, or o1-mini model
messages=[
{"role": "developer","content": "You are a helpful assistant."}, # optional equivalent to a system message for reasoning models
{"role": "user", "content": "What steps should I think about when writing my first Python API?"},
],
max_completion_tokens = 5000
)
print(response.model_dump_json(indent=2))
By default the o3-mini
and o1
models will not attempt to produce output that includes markdown formatting. A common use case where this behavior is undesirable is when you want the model to output code contained within a markdown code block. When the model generates output without markdown formatting you lose features like syntax highlighting, and copyable code blocks in interactive playground experiences. To override this new default behavior and encourage markdown inclusion in model responses, add the string Formatting re-enabled
to the beginning of your developer message.
Adding Formatting re-enabled
to the beginning of your developer message does not guarantee that the model will include markdown formatting in its response, it only increases the likelihood. We have found from internal testing that Formatting re-enabled
is less effective by itself with the o1
model than with o3-mini
.
To improve the performance of Formatting re-enabled
you can further augment the beginning of the developer message which will often result in the desired output. Rather than just adding Formatting re-enabled
to the beginning of your developer message, you can experiment with adding a more descriptive initial instruction like one of the examples below:
Formatting re-enabled - please enclose code blocks with appropriate markdown tags.
Formatting re-enabled - code output should be wrapped in markdown.
Depending on your expected output you may need to customize your initial developer message further to target your specific use case.
Evenementer
Mar 17, 9 PM - Mar 21, 10 AM
Maacht mat bei der Meetup-Serie, fir skaléierbar KI-Léisungen op Basis vu realistesche Benotzungsfäll mat aneren Entwéckler an Experten ze bauen.
Elo umellenTraining
Modul
Apply prompt engineering with Azure OpenAI Service - Training
In this module, learn how prompt engineering can help to create and fine-tune prompts for natural language processing models. Prompt engineering involves designing and testing various prompts to optimize the performance of the model in generating accurate and relevant responses.