need some details of PTU in Azure OpenAI Service

Lianne Goodwin 20 Reputation points
2024-10-07T05:19:30.05+00:00

Hey. I heard people talked about quota method in Azure OpenAI Service called PTU. Just wonder what is the difference between PTU and the normal deployment quota. Thanks.

Azure OpenAI Service
Azure OpenAI Service
An Azure service that provides access to OpenAI’s GPT-3 models with enterprise capabilities.
4,092 questions
0 comments No comments
{count} votes

Accepted answer
  1. Daniel Fang 1,060 Reputation points MVP
    2024-10-07T05:31:22.78+00:00

    Hi Lianne

    Provisioned throughput units (PTU) are generic units of model processing capacity that you can use to size provisioned deployments to achieve the required throughput for processing prompts and generating completions. Provisioned throughput units are granted to a subscription as quota on a regional basis, which defines the maximum number of PTUs that can be assigned to deployments in that subscription and region.

    Unlike the Tokens Per Minute (TPM) quota used by other Azure OpenAI offerings, PTUs are model-independent. The PTUs might be used to deploy any supported model/version in the region. You can find more details in below link.

    https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/provisioned-throughput

    The PTU can further be ProvisionedManaged or GlobalProvisionedManaged. Here pricing of PTU is below in USD
    User's image

    https://azure.microsoft.com/en-us/pricing/details/cognitive-services/openai-service/

    1 person found this answer helpful.
    0 comments No comments

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.