MAX_TOKEN not working for response from Azure Open AI

Prince Solomon 0 Reputation points
2025-05-20T08:20:52.9666667+00:00

We are currently using gpt-4o version "2024-11-20" which has a output token limit of 16k, but it seems to default to 4k output tokens.

Does Azure support 16k output tokens ? If so, how can we set the value ?

Azure OpenAI Service
Azure OpenAI Service
An Azure service that provides access to OpenAI’s GPT-3 models with enterprise capabilities.
4,081 questions
0 comments No comments
{count} votes

2 answers

Sort by: Most helpful
  1. Sina Salam 22,031 Reputation points Volunteer Moderator
    2025-05-20T19:25:47.3466667+00:00

    Hello Prince Solomon,

    Welcome to the Microsoft Q&A and thank you for posting your questions here.

    I understand you are having issues with setting the max_tokens parameter for GPT-4o in Azure OpenAI, especially, to utilize its full 16k output token capacity.

    Yes, Azure OpenAI does indeed support up to 16,384 tokens for GPT-4o, but by default, the system limits output to 4,096 tokens unless explicitly configured otherwise. This behavior often leads to confusion when developers expect longer responses but receive truncated outputs. The key point here is that the max_tokens parameter must be manually set in your API request to a value that fits within the model’s total token limit, which includes both input and output tokens.

    Therefore, to fully leverage the 16k output capability, you should calculate the number of tokens in your input prompt and subtract that from 16,384 to determine the maximum allowable output. You can use the tiktoken library to tokenize your input and ensure compliance with the model’s constraints. Here's a sample configuration for your API request:

    {
      "model": "gpt-4o",
      "messages": [
        {"role": "user", "content": "Your prompt here"}
      ],
      "max_tokens": 12000,
      "temperature": 0.7
    }
    

    This setup ensures that your output is not prematurely cut off due to default limits. If max_tokens is omitted or set too high without accounting for input length, Azure may default to 4k or truncate the response unexpectedly.

    For more details on token usage and best practices, read more on the official Azure OpenAI documentation.

    I hope this is helpful! Do not hesitate to let me know if you have any other questions or clarifications.


    Please don't forget to close up the thread here by upvoting and accept it as an answer if it is helpful.


  2. Deleted

    This answer has been deleted due to a violation of our Code of Conduct. The answer was manually reported or identified through automated detection before action was taken. Please refer to our Code of Conduct for more information.


    Comments have been turned off. Learn more

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.