Azure OpenAI GPT Turbo 3.5 Instruct Model

Question

Azure OpenAI GPT Turbo 3.5 Instruct Model

Test Admin 176

We are switching the Azure Open AI text-davinci-003 model to gpt-35-turbo-instruct (0914) model (Standard tier.)

Region : EastUS

We created one resource in Azure Open AI services and deployed the gpt-35-turbo-instruct model.

As it is mentioned in the Microsoft documentation that gpt-35-turbo-instruct (0914) only accepts the maximum tokens - 4097 tokens
https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/models#gpt-35-models

But when we use this model, maxtoken is accepted above 4096 tokens(Accepting more than 8000 tokens)

Sample code used to check this model,

String azureKey = "t#######";
String deploymentOrModelId = "deploymentID"; //deploymentid created in Azure portal(gpt-35-turbo-instruct) under Azure open AI services

String endpoint = "https://wyz.openai.azure.com/";

OpenAIClient client = new OpenAIClientBuilder().endpoint(endpoint)
		.credential(new AzureKeyCredential(azureKey)).buildClient();

List<String> prompt = new ArrayList<>();
prompt.add("what is tree");

CompletionsOptions options = new CompletionsOptions(prompt);
options.setMaxTokens(800);
options.setPresencePenalty(0.0);
options.setFrequencyPenalty(0.0);
options.setTemperature(1.0);
options.setTopP(0.5);

Completions completions = client.getCompletions(deploymentOrModelId, options);

Dependencies used,

<dependency>
	<groupId>com.azure</groupId>
	<artifactId>azure-sdk-bom</artifactId>
	<version>1.2.15</version>
	<type>pom</type>
	<scope>import</scope>
</dependency>

<dependency> 
  <groupId>com.azure</groupId> 
  <artifactId>azure-ai-openai</artifactId> 
  <version>1.0.0-beta.2</version> 
</dependency>

Kindly suggest us to solve this problem, is there any setting need to be checked in Azure portal?

Saurabh Sharma 23,846 Reputation points Microsoft Employee Moderator

2023-10-24T20:09:03.4033333+00:00
Hi @Test Admin

Welcome to Microsoft Q&A! Thanks for posting the question.

If I understand correctly, you are talking about the options.setMaxTokens(8000) as it is accepting more than 4096 tokens. If that's correct, I believe it is accepting more than your model because there are other models like gpt-35-turbo-16k which could handle more than 4096 tokens. So, even if you set the max token more than what the model is capable of, the model will provide the response, but the response will be truncated to the maximum number of tokens that the model can generate in total, including the prompt. Also, if your prompt is longer than 4096 tokens, the model will not generate any completion at all. If your prompt is shorter than 4096 tokens, the model will generate a completion that fills up the remaining tokens up to 4096, or until it reaches a natural stopping point.

I suggest you to make sure that your prompt and maxtokens parameters do not exceed it. You can use the below code to get the token usage information against your prompt and response.

CompletionsUsage usage = completions.getUsage(); System.out.printf("Usage: number of prompt token is %d, " + "number of completion token is %d, and number of total tokens in request and response is %d.%n",usage.getPromptTokens(), usage.getCompletionTokens(), usage.getTotalTokens());

Please let me know if you have any other questions.

Thanks,

Saurabh

Please 'Accept as answer' and Upvote if it helped so that it can help others in the community looking for help on similar topics.
Test Admin 176 Reputation points

2023-10-25T05:21:05.4366667+00:00

Hi @saurabh sharma

Thank you so much for the reply.

We created deployment of gpt-35-turbo-instruct model in Azure Open AI services.

In prompt , we passed a paragraph which is 4000 tokens, max token is 800 so total will be 4800

In this case , we didn't receive any error like max-token limit reached, because we will handle the exception to show the maxtoken limit reached message.

Could you help us about this issue.
Saurabh Sharma 23,846 Reputation points Microsoft Employee Moderator

2023-10-27T20:36:59.8966667+00:00

@Test Admin ok. I am checking internally on this one. I will let you know if I am able to find any information on the same.

Thanks,

Saurabh

Accepted answer

0 additional answers

Your answer

Saurabh Sharma 23,846 Reputation points Microsoft Employee Moderator

2023-10-24T20:09:03.4033333+00:00

Hi @Test Admin

Welcome to Microsoft Q&A! Thanks for posting the question.

If I understand correctly, you are talking about the options.setMaxTokens(8000) as it is accepting more than 4096 tokens. If that's correct, I believe it is accepting more than your model because there are other models like gpt-35-turbo-16k which could handle more than 4096 tokens. So, even if you set the max token more than what the model is capable of, the model will provide the response, but the response will be truncated to the maximum number of tokens that the model can generate in total, including the prompt. Also, if your prompt is longer than 4096 tokens, the model will not generate any completion at all. If your prompt is shorter than 4096 tokens, the model will generate a completion that fills up the remaining tokens up to 4096, or until it reaches a natural stopping point.

I suggest you to make sure that your prompt and maxtokens parameters do not exceed it. You can use the below code to get the token usage information against your prompt and response.

CompletionsUsage usage = completions.getUsage(); System.out.printf("Usage: number of prompt token is %d, " + "number of completion token is %d, and number of total tokens in request and response is %d.%n",usage.getPromptTokens(), usage.getCompletionTokens(), usage.getTotalTokens());

Please let me know if you have any other questions.

Thanks,

Saurabh

Please 'Accept as answer' and Upvote if it helped so that it can help others in the community looking for help on similar topics.
Test Admin 176 Reputation points

2023-10-25T05:21:05.4366667+00:00

Hi @saurabh sharma

Thank you so much for the reply.

We created deployment of gpt-35-turbo-instruct model in Azure Open AI services.

In prompt , we passed a paragraph which is 4000 tokens, max token is 800 so total will be 4800

In this case , we didn't receive any error like max-token limit reached, because we will handle the exception to show the maxtoken limit reached message.

Could you help us about this issue.
Saurabh Sharma 23,846 Reputation points Microsoft Employee Moderator

2023-10-27T20:36:59.8966667+00:00

@Test Admin ok. I am checking internally on this one. I will let you know if I am able to find any information on the same.

Thanks,

Saurabh

Answer 1

Saurabh Sharma 23,846 Microsoft Employee Moderator

@Test Admin This looks like an expected behavior, however, it is not recommended to exceed this token limit. You can find the below note in the documentation over here as well. User's image

Please let me know if you have any other questions.

Thanks

Saurabh

Please 'Accept as answer' and Upvote if it helped so that it can help others in the community looking for help on similar topics.

Saurabh Sharma 23,846 Reputation points Microsoft Employee Moderator

2023-11-10T19:41:03.5666667+00:00

Hi @Test Admin ,

I have not heard back from you. Did my answer solve your issue? If so, please mark as accepted answer. If not, please let me know how I may better assist.

Thanks

Saurabh
Test Admin 176 Reputation points

2023-11-14T12:23:01.35+00:00

Hi @saurabh sharma ,

Sorry for the late response,

As per above reply, we understand that it's unsupported configuration issue.
May I know the approximate time, when will the model configuration officially support?

Thanks
Saurabh Sharma 23,846 Reputation points Microsoft Employee Moderator

2023-11-14T19:06:18.3566667+00:00

Test Admin Unfortunately, I am not privy to this information, and we may have to depend on public announcements for any updates on Azure Open AI features or Model updates.

Thanks,

Saurabh

Share via

Azure OpenAI GPT Turbo 3.5 Instruct Model

0 additional answers

Your answer