why model's maximum context length is 4096 tokens of gpt-3.5-turbo-0125 ?

Question

why model's maximum context length is 4096 tokens of gpt-3.5-turbo-0125 ?

yanan chen 0

hi there, i am calling the azure openai using gpt-3.5-turbo-0125 as listed in https://platform.openai.com/docs/models/gpt-3-5-turbo

openai.api_key = "*****"
openai.api_base = "https://lge-chatgpt-002.openai.azure.com/"
openai.api_type = 'azure'
openai.api_version = '2023-09-01-preview'
MODEL = 'gpt-3.5-turbo-0125'
response = openai.ChatCompletion.create(  # type: ignore
                        engine = "gpt-35-turbo",
                        model = MODEL,
                        messages = messages,
                        temperature=0,
                        max_tokens=max_tokens,
                        stop= None
                    )
```however, i got error: 
> This model's maximum context length is 4096 tokens. However, you
> requested 4274 tokens (3774 in the messages, 500 in the completion).
> Please reduce the length of the messages or completion.
from the website, the max context length is 16,385 tokens. so any reason for this error ?
thanks.

Saeid Dahl 0 Reputation points

2024-02-20T20:22:11.18+00:00

Hi Yanan,
It's about the Return Maximum token!
The phrase “returns a maximum of” indicates that the output length of the GPT-3.5 Turbo model is capped at 4,096 tokens. In other words, when you generate a response using this model, the total number of tokens (words, punctuation, spaces, etc.) in the output will not exceed 4,096. This limitation ensures that the model’s responses remain manageable and efficient. If a conversation or prompt generates an output that exceeds this token limit, you may need to truncate or shorten the text to fit within the allowed range.
Saurabh Sharma 23,846 Reputation points Microsoft Employee Moderator

2024-02-20T20:51:38.7866667+00:00

Hi @yanan chen
Welcome to Microsoft Q&A! Thanks for posting the question.

Can you please check the version of the deployed model by going to the Deployment section on your Azure Open AI Studio (see screenshot below)

Thanks Saurabh
yanan chen 0 Reputation points

2024-02-20T20:54:08.0533333+00:00

thanks. however, i have set max_tokens as 500, which means that the output should not exceed 500. and the error message says that maximum context length exceed the 4096.
yanan chen 0 Reputation points

2024-02-20T20:57:03.7666667+00:00

@saurabh sharma thanks for your reply. api_version = '2023-09-01-preview' I cannot log in the link you send due to my token is from my employer.
Saurabh Sharma 23,846 Reputation points Microsoft Employee Moderator

2024-02-20T21:27:59.81+00:00

@yanan chen I have asked you to check the model version as I suspect you are using older model version of GPT-35-Turbo and thus hitting the error the combination of input tokens (3774) and the generated response tokens (500) is not fitting within the model capacity i.e. 4096. Please check the link as older model has tokens limit of 4096.

Thanks Saurabh
Deleted

This comment has been deleted due to a violation of our Code of Conduct. The comment was manually reported or identified through automated detection before action was taken. Please refer to our Code of Conduct for more information.
yanan chen 0 Reputation points

2024-02-21T18:09:21.6466667+00:00
@Charlie Wei thanks for your help. when i set

openai.api_version = 'gpt-4-0125-preview'

it returns the error from openai.ChatCompletion.create

InvalidRequestError: Resource not found

i guess it is because our team does not have gpt-4 usage permission. therefore, thought 2023-09-01-preview works for me now, it cannot intake messages more than 4096 tokens. i think this is a obvious difference between azure and openai .
Saurabh Sharma 23,846 Reputation points Microsoft Employee Moderator

2024-02-27T00:01:32.2466667+00:00

@yanan chen You are getting this error as you are trying to pass the model name to api_version attribute. Supported API_versions as per the documentation are listed below -

I also notice that you are passing both engine and model parameters which is not correct.

If you are using OpenAI python version 1.x and above they you need to pass only the model parameter and not the engine parameter. Please refer to the documentation. Also, as stated earlier, you need to pass the deployment name of the model you are using and not the actual underlying model name (gpt-35-turbo, gpt-35-turbo-16k, gpt-4 etc.) Please let me know if you still see any issues. Thanks Saurabh

1 answer

Your answer

Saeid Dahl 0 Reputation points

2024-02-20T20:22:11.18+00:00

Hi Yanan,
It's about the Return Maximum token!
The phrase “returns a maximum of” indicates that the output length of the GPT-3.5 Turbo model is capped at 4,096 tokens. In other words, when you generate a response using this model, the total number of tokens (words, punctuation, spaces, etc.) in the output will not exceed 4,096. This limitation ensures that the model’s responses remain manageable and efficient. If a conversation or prompt generates an output that exceeds this token limit, you may need to truncate or shorten the text to fit within the allowed range.
Saurabh Sharma 23,846 Reputation points Microsoft Employee Moderator

2024-02-20T20:51:38.7866667+00:00

Hi @yanan chen
Welcome to Microsoft Q&A! Thanks for posting the question.

Can you please check the version of the deployed model by going to the Deployment section on your Azure Open AI Studio (see screenshot below)

Thanks Saurabh
yanan chen 0 Reputation points

2024-02-20T20:54:08.0533333+00:00

thanks. however, i have set max_tokens as 500, which means that the output should not exceed 500. and the error message says that maximum context length exceed the 4096.
yanan chen 0 Reputation points

2024-02-20T20:57:03.7666667+00:00

@saurabh sharma thanks for your reply. api_version = '2023-09-01-preview' I cannot log in the link you send due to my token is from my employer.
Saurabh Sharma 23,846 Reputation points Microsoft Employee Moderator

2024-02-20T21:27:59.81+00:00

@yanan chen I have asked you to check the model version as I suspect you are using older model version of GPT-35-Turbo and thus hitting the error the combination of input tokens (3774) and the generated response tokens (500) is not fitting within the model capacity i.e. 4096. Please check the link as older model has tokens limit of 4096.

Thanks Saurabh
Deleted

This comment has been deleted due to a violation of our Code of Conduct. The comment was manually reported or identified through automated detection before action was taken. Please refer to our Code of Conduct for more information.
yanan chen 0 Reputation points

2024-02-21T18:09:21.6466667+00:00

@Charlie Wei thanks for your help. when i set

openai.api_version = 'gpt-4-0125-preview'

it returns the error from openai.ChatCompletion.create

InvalidRequestError: Resource not found

i guess it is because our team does not have gpt-4 usage permission. therefore, thought 2023-09-01-preview works for me now, it cannot intake messages more than 4096 tokens. i think this is a obvious difference between azure and openai .
Saurabh Sharma 23,846 Reputation points Microsoft Employee Moderator

2024-02-27T00:01:32.2466667+00:00

@yanan chen You are getting this error as you are trying to pass the model name to api_version attribute. Supported API_versions as per the documentation are listed below -

I also notice that you are passing both engine and model parameters which is not correct.

If you are using OpenAI python version 1.x and above they you need to pass only the model parameter and not the engine parameter. Please refer to the documentation. Also, as stated earlier, you need to pass the deployment name of the model you are using and not the actual underlying model name (gpt-35-turbo, gpt-35-turbo-16k, gpt-4 etc.) Please let me know if you still see any issues. Thanks Saurabh

Answer 1

Hello @yanan chen ,

Regarding the program you provided, I have observed several points. First, based on the documentation, Azure currently only offers the gpt-4-0125-preview, and has not yet provided the gpt-35-0125-preview. Next, MODEL = 'gpt-3.5-turbo-0125', I believe, is OpenAI's notation. Lastly, as the comments suggested, I recommend reconfirming the deployment version of the model. We can further discuss how to improve this issue.

Best regards,
Charlie

If you find my response helpful, please consider accepting this answer and voting 'yes' to support the community. Thank you!

Share via

why model's maximum context length is 4096 tokens of gpt-3.5-turbo-0125 ?

1 answer

Your answer