Edit

Azure OpenAI frequently asked questions

Data and Privacy

Do you use my company data to train any of the models?

Azure OpenAI doesn't use customer data to retrain models. For more information, see the Azure OpenAI data, privacy, and security guide.

General

Does Azure OpenAI support custom API headers? We append additional custom headers to our API requests and are seeing HTTP 431 failure errors.

Our current APIs allow up to 10 custom headers, which are passed through the pipeline, and returned. We have noticed some customers now exceed this header count resulting in HTTP 431 errors. There's no solution for this error, other than to reduce header volume. In future API versions we'll no longer pass through custom headers. We recommend customers not depend on custom headers in future system architectures.

Does Azure OpenAI work with the latest Python library released by OpenAI (version>=1.0)?

Azure OpenAI is supported by the latest release of the OpenAI Python library (version>=1.0). However, it's important to note migration of your codebase using openai migrate isn't supported and won't work with code that targets Azure OpenAI.

How do the capabilities of Azure OpenAI compare to OpenAI?

Azure OpenAI gives customers advanced language AI with the latest OpenAI models with the security and enterprise promise of Azure. Azure OpenAI codevelops the APIs with OpenAI, ensuring compatibility and a smooth transition from one to the other.

With Azure OpenAI, customers get the security capabilities of Microsoft Azure while running the same models as OpenAI.

Does Azure OpenAI support VNETs and Private Endpoints?

Yes, Azure OpenAI supports VNETs and Private Endpoints. To learn more, consult the virtual networking guidance.

I'm trying to use embeddings and received the error "InvalidRequestError: Too many inputs. The max number of inputs is 16." How do I fix this?

This error typically occurs when you try to send a batch of text to embed in a single API request as an array. Currently Azure OpenAI only supports arrays of embeddings with multiple inputs for the text-embedding-ada-002 Version 2 model. This model version supports an array consisting of up to 16 inputs per API request. The array can be up to 8,191 tokens in length when using the text-embedding-ada-002 (Version 2) model.

When I ask the model which model it's running, it tells me it's running a different version. Why does this happen?

Azure OpenAI models being unable to correctly identify what model is running is expected behavior.

Why does this happen?

The model is performing next token prediction in response to your question. It doesn't have native ability to query what model version is currently being run. To confirm the model behind a deployment, go to Microsoft Foundry > Deployments or Models + endpoints and check the model name column.

Questions like "What model are you running?" or "What is the latest model from OpenAI?" depend entirely on the model's training data. New models are typically released after their training cutoff, so the model won't have accurate information about itself or other models released since that cutoff. See the models page for the knowledge cutoff of each model.

If you want the model to answer accurately, supply the information through prompt engineering, Retrieval Augmented Generation (RAG), or fine-tuning.

How can I get the model to respond in a specific language?

Ensure that your prompt is clear and specific about the language requirement. If the issue persists, consider adding more context or rephrasing the prompt to reinforce the language instruction.

Example prompts:

  • "Please respond in English and only in English."
  • "Answer the following question in English: What is the weather like in Fresno?"

I asked the model when its knowledge cutoff is and it gave me a different answer than what is on the Azure OpenAI model's page. Why does this happen?

This is expected behavior. The models aren't able to answer questions about themselves. If you want to know when the knowledge cutoff for the model's training data is, consult the models page.

I asked the model a question about something that happened recently before the knowledge cutoff and it got the answer wrong. Why does this happen?

This is expected behavior. There's no guarantee that every recent event was part of the model's training data, and even when information was in training data there's always a chance of ungrounded responses. Use Retrieval Augmented Generation (RAG) — for example, via Foundry Agents grounding — to ground model responses in current data.

The frequency that a given piece of information appeared in the training data also impacts how the model responds.

While you can probe a model with questions to guess its training data cutoff, the models page is the authoritative source.

Where do I access pricing information for legacy models, which are no longer available for new deployments?

Legacy pricing information is available via a downloadable PDF file. For all other models, consult the official pricing page.

Looking up pricing programmatically

If you can't find a specific model or SKU on the pricing page, you can query the Azure Retail Prices REST API. It's public, requires no authentication, and returns per-region, per-SKU pricing for all Azure OpenAI meters.

Key concepts:

  • productName — filter by Azure OpenAI (most models) or Azure OpenAI GPT5 (GPT-5 family).
  • skuName — encodes the model, direction (input/output), and deployment type. Examples: gpt 4.1 Inp Data Zone, gpt 4o 0513 Input global, gpt 4o 1120 output.
  • armRegionName — the Azure region. Meters without global or Data Zone in the SKU name are regional and priced per-region.
  • Pricing varies by deployment type: Global (lowest) → Data Zone (~10% premium) → Regional (10–25% premium over Global, varying by region).

Python example — look up input pricing for a model across all regions and deployment types:

import requests

BASE_URL = "https://prices.azure.com/api/retail/prices"

def get_model_pricing(model_filter: str, product_name: str = "Azure OpenAI") -> list[dict]:
    """Fetch all consumption pricing for a given model from the Azure Retail Prices API."""
    items = []
    odata = (
        f"productName eq '{product_name}' "
        f"and priceType eq 'Consumption' "
        f"and contains(skuName, '{model_filter}')"
    )
    params = {"$filter": odata, "api-version": "2023-01-01-preview"}
    url = BASE_URL
    while url:
        resp = requests.get(url, params=params, timeout=30)
        resp.raise_for_status()
        data = resp.json()
        items.extend(data.get("Items", []))
        url = data.get("NextPageLink")
        params = {}  # NextPageLink already includes query params
    return items

# Example: compare gpt-4o 2024-11-20 input pricing across deployment types
items = get_model_pricing("gpt 4o 1120")

input_items = [
    i for i in items
    if i["retailPrice"] > 0
    and "inp" in i["skuName"].lower()
    and not any(x in i["skuName"].lower() for x in ["batch", "prov", "ft "])
]

for item in sorted(input_items, key=lambda x: x["retailPrice"]):
    sku = item["skuName"]
    tier = (
        "Global" if "global" in sku.lower()
        else "Data Zone" if "data zone" in sku.lower()
        else "Regional"
    )
    print(
        f"${item['retailPrice']:.4f} / {item['unitOfMeasure']:<12}  "
        f"{tier:<10}  {item['armRegionName']:<20}  SKU: {sku}"
    )

Sample output (abbreviated):

$0.0025 / 1K Tokens    Global      eastus2          SKU: gpt 4o 1120 Inp global
$0.0025 / 1K Tokens    Regional    australiaeast    SKU: gpt 4o 1120 Inp
$0.0028 / 1K Tokens    Data Zone   eastus2          SKU: gpt 4o 1120 Inp Data Zone
$0.0028 / 1K Tokens    Regional    eastus           SKU: gpt 4o 1120 Inp
$0.0030 / 1K Tokens    Regional    swedencentral    SKU: gpt 4o 1120 Inp

To compare pricing across all models at once, omit the contains(skuName, ...) filter and use productName eq 'Azure OpenAI'. For GPT-5 family models, use productName eq 'Azure OpenAI GPT5'.

How do I fix InternalServerError - 500 - Failed to create completion as the model generated invalid Unicode output?

You can minimize the occurrence of these errors by reducing the temperature of your prompts to less than 1 and ensuring you're using a client with retry logic. Reattempting the request often results in a successful response.

How do I fix Server error (500): Unexpected special token

This is a known issue. You can minimize the occurrence of these errors by reducing the temperature of your prompts to less than 1 and ensuring you're using a client with retry logic. Reattempting the request often results in a successful response.

If reducing temperature to less than 1 doesn't reduce the frequency of this error an alternative workaround is set presence/frequency penalties and logit biases to their default values. In some cases, it may help to set top_p to a non-default, lower value to encourage the model to avoid sampling tokens with lower probability tokens.

We noticed charges associated with API calls that failed to complete with status code 400. Why are failed API calls generating a charge?

If the service performs processing, you'll be charged even if the status code isn't successful (not 200). Common examples of this are, a 400 error due to a content filter, or a 408 error due to a time-out. Charges will also occur when a status 200 is received with a finish_reason of content_filter. In this case the prompt didn't have any issues, but the completion generated by the model was detected to violate the content filtering rules, which result in the completion being filtered. If the service doesn't perform processing, you won't be charged. For example, a 401 error due to authentication or a 429 error due to exceeding the Rate Limit.

Do all Azure OpenAI models support `max_completion_tokens` with the chat completions API?

No, all Azure OpenAI models don't support max_completion_tokens. This parameter isn't supported with older models like gpt-4 (turbo-2024-04-09).

Getting access to Azure OpenAI Service

My guest account has been given access to an Azure OpenAI resource, but I'm unable to access that resource in the [Microsoft Foundry portal](https://ai.azure.com/?cid=learnDocs). How do I enable access?

This is expected behavior when using the default sign-in experience for the Microsoft Foundry.

To access Microsoft Foundry from a guest account that has been granted access to an Azure OpenAI resource:

  1. Open a private browser session and then navigate to https://ai.azure.com.
  2. Rather than immediately entering your guest account credentials instead select Sign-in options
  3. Now select Sign in to an organization
  4. Enter the domain name of the organization that granted your guest account access to the Azure OpenAI resource.
  5. Now sign-in with your guest account credentials.

You should now be able to access the resource via the Microsoft Foundry portal.

Alternatively if you're signed into the Azure portal from the Azure OpenAI resource's Overview pane you can select Go to Microsoft Foundry to automatically sign in with the appropriate organizational context.

Learning more and where to ask questions

Where can I read about the latest updates to Azure OpenAI?

For monthly updates, see our what's new page.

Where can I get training to get started learning and build my skills around Azure OpenAI?

Where can I post questions and see answers to other common questions?

Where do I go for Azure OpenAI customer support?

You can learn about all the support options for Azure OpenAI in the support and help options guide.

Models and fine-tuning

What models are available?

Consult the Azure OpenAI model availability guide.

Where can I find out what region a model is available in?

Consult the Azure OpenAI model availability guide for region availability.

What are the SLAs (Service Level Agreements) in Azure OpenAI?

We do offer an Availability SLA for all resources and a Latency SLA for Provisioned-Managed Deployments. For more information about the SLA for Azure OpenAI Service, see the Service Level Agreements (SLA) for Online Services page.

How do I enable fine-tuning? Create a custom model is greyed out in [Microsoft Foundry portal](https://ai.azure.com/?cid=learnDocs).

In order to successfully access fine-tuning, you need Foundry User role assigned. Even someone with high-level Service Administrator permissions would still need this account explicitly set in order to access fine-tuning. For more information, please review the role-based access control guidance.

What is the difference between a base model and a fine-tuned model?

A base model is a model that hasn't been customized or fine-tuned for a specific use case. Fine-tuned models are customized versions of base models where a model's weights are trained on a unique set of prompts. Fine-tuned models let you achieve better results on a wider number of tasks without needing to provide detailed examples for in-context learning as part of your completion prompt. To learn more, review our fine-tuning guide.

What is the maximum number of fine-tuned models I can create?

100

Why was my fine-tuned model deployment deleted?

If a customized (fine-tuned) model is deployed for more than 15 days during which no completions or chat completions calls are made to it, the deployment is automatically deleted (and no further hosting charges are incurred for that deployment). The underlying customized model remains available and can be redeployed at any time. To learn more, check out the how-to-article.

How do I deploy a model with the REST API?

There are currently two different REST APIs that allow model deployment. For the latest model deployment features such as the ability to specify a model version during deployment for models like text-embedding-ada-002 Version 2, use the Deployments - Create Or Update REST API call.

Can I use quota to increase the max token limit of a model?

No, quota Tokens-Per-Minute (TPM) allocation isn't related to the max input token limit of a model. Model input token limits are defined in the models table and aren't impacted by changes made to TPM.

Assistants

What is the status of the Azure OpenAI Assistants API?

The Azure OpenAI Assistants API is being retired. Build new agents on the Foundry Agents service, which is the supported replacement and supports a broader set of models, tools, and grounding sources. For migration guidance, see Migrate to the new Foundry Agent Service.

Web app

How can I customize my published web app?

You can customize your published web app in the Azure portal. The source code for the published web app is available on GitHub, where you can find information on changing the app frontend, as well as instructions for building and deploying the app.

Will my web app be overwritten when I deploy the app again from the [Microsoft Foundry portal](https://ai.azure.com/?cid=learnDocs)?

Your app code won't be overwritten when you update your app. The app will be updated to use the Azure OpenAI resource, Azure AI Search index, and model settings selected in the Microsoft Foundry portal without any change to the appearance or functionality.

Grounding on your data

How do I ground a model on my own data?

Use the Foundry Agents service with an Azure AI Search index or other supported knowledge source. The Foundry Agents service is the supported replacement for the "Azure OpenAI on your data" feature and supports a broader set of models, tools, and grounding sources. For an end-to-end walkthrough, see Grounding with Azure AI Search.

The Customer Copyright Commitment is a provision to be included in the December 1, 2023, Microsoft Product Terms that describes Microsoft’s obligation to defend customers against certain third-party intellectual property claims relating to Output Content. If the subject of the claim is Output Content generated from the Azure OpenAI (or any other Covered Product that allows customers to configure the safety systems), then to receive coverage, customer must have implemented all mitigations required by the Azure OpenAI documentation in the offering that delivered the Output Content. The required mitigations are documented here and updated on an ongoing basis. For new services, features, models, or use cases, new CCC requirements will be posted and take effect at or following the launch of such service, feature, model, or use case. Otherwise, customers will have six months from the time of publication to implement new mitigations to maintain coverage under the CCC. If a customer tenders a claim, the customer will be required to demonstrate compliance with the relevant requirements. These mitigations are required for Covered Products that allow customers to configure the safety systems, including Azure OpenAI Service; they don't impact coverage for customers using other Covered Products.