what is PTU for gpt-4 model deployment.

Question

what is PTU for gpt-4 model deployment.

Anonymous

what are PTU for gpt-4 model deployment?

AshokPeddakotla-MSFT 35,971 Reputation points Moderator

2024-01-24T12:52:31.27+00:00

Neha Chopade Just checking to see if you had a chance to review the below response(s).

Do let me know if that helps or have any other queries.

If the response helped, please do click Accept Answer and Yes for was this answer helpful.

2 answers

Your answer

AshokPeddakotla-MSFT 35,971 Reputation points Moderator

2024-01-24T12:52:31.27+00:00

Neha Chopade Just checking to see if you had a chance to review the below response(s).

Do let me know if that helps or have any other queries.

If the response helped, please do click Accept Answer and Yes for was this answer helpful.

Answer 1

Azar 29,520 MVP Volunteer Moderator

Hey Neha Chopade

The PTU allows you to specify the amount of throughput you require in a deployment. The service then allocates the necessary model processing capacity and ensures it's ready for you. Throughput is defined in terms of provisioned throughput units (PTU) which is a normalized way of representing the throughput for your deployment. Each model-version pair requires different amounts of PTU to deploy and provide different amounts of throughput per PTU.

For a detailed info on this plz follow the below link,

https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/provisioned-throughput

If this helps kindly accept the answer thanks much.

Azar 29,520 Reputation points MVP Volunteer Moderator

2024-01-19T13:55:44.9133333+00:00

Hi I hope the information provided has been helpful to you! If so, please accept the answer by clicking the Accept Answer or Upvote on the post. We value your feedback, and it will help to assist others who might have a similar query. Thank you for your contribution in enhancing Microsoft Q&A!
Azar 29,520 Reputation points MVP Volunteer Moderator

2024-01-23T13:10:05.02+00:00

Hi I hope the information provided has been helpful to you! If so, please accept the answer by clicking the Accept Answer or Upvote on the post. We value your feedback, and it will help to assist others who might have a similar query. Thank you for your contribution in enhancing Microsoft Q&A!

Answer 2

Neha Chopade Greetings & Welcome to Microsoft Q&A forum!

Provisioned throughput units (PTU) are units of model processing capacity that customers you can reserve and deploy for processing prompts and generating completions. The minimum PTU deployment, increments, and processing capacity associated with each unit varies by model type & version.

Note that Quota is specific to a (deployment type, model, region) triplet and isn't interchangeable. Meaning you can't use quota for GPT-4 to deploy GPT-35-turbo.

what are PTU for gpt-4 model deployment?

As mentioned, each model-version pair requires different amounts of PTU to deploy and provide different amounts of throughput per PTU.

To get a quick estimate for your workload, open the capacity planner in the Azure OpenAI Studio. The capacity planner is under Management > Quotas > Provisioned. The Provisioned option and the capacity planner are only available in certain regions within the Quota pane, if you don't see this option setting the quota region to Sweden Central will make this option available.

User's image

Please see below documentations to learn more about PTU.

What is provisioned throughput?

Get started using Provisioned Deployments on the Azure OpenAI Service

Provisioned throughput units onboarding

Do let me know if that helps or have any other queries.

---If the response helped, please do click Accept Answer and Yes for was this answer helpful. Doing so would help other community members with similar issue identify the solution. I highly appreciate your contribution to the community.

Share via

what is PTU for gpt-4 model deployment.

2 answers

Your answer