Provisioned throughput units onboarding

This article walks you through the process of onboarding to Provisioned Throughput Units (PTU). Once you complete the initial onboarding, we recommend referring to the PTU getting started guide.

Note

Provisioned Throughput Units (PTU) are different from standard quota in Azure OpenAI and are not available by default. To learn more about this offering contact your Microsoft Account Team.

When to use provisioned throughput units (PTU)

You should consider switching from pay-as-you-go to provisioned throughput when you have well-defined, predictable throughput requirements. Typically, this occurs when the application is ready for production or has already been deployed in production and there is an understanding of the expected traffic. This will allow users to accurately forecast the required capacity and avoid unexpected billing.

Typical PTU scenarios

  • An application that is ready for production or in production.
  • Application has predictable capacity/usage expectations.
  • Application has real-time/latency sensitive requirements.

Note

In function calling and agent use cases, token usage can be variable. You should understand your expected Tokens Per Minute (TPM) usage in detail prior to migrating the workloads to PTU.

Sizing and estimation: provisioned managed only

Determining the right amount of provisioned throughput, or PTUs, you require for your workload is an essential step to optimizing performance and cost. This section describes how to use the Azure OpenAI capacity planning tool. The tool provides you with an estimate of the required PTU to meet the needs of your workload.

Estimate provisioned throughput and cost

To get a quick estimate for your workload, open the capacity planner in the Azure OpenAI Studio. The capacity planner is under Management > Quotas > Provisioned.

The Provisioned option and the capacity planner are only available in certain regions within the Quota pane, if you don't see this option setting the quota region to Sweden Central will make this option available. Enter the following parameters based on your workload.

Input Description
Model OpenAI model you plan to use. For example: GPT-4
Version Version of the model you plan to use, for example 0614
Prompt tokens Number of tokens in the prompt for each call
Generation tokens Number of tokens generated by the model on each call
Peak calls per minute Peak concurrent load to the endpoint measured in calls per minute

After you fill in the required details, select Calculate to view the suggested PTU for your scenario.

Screenshot of the Azure OpenAI Studio landing page.

Note

The capacity planner is an estimate based on simple input criteria. The most accurate way to determine your capacity is to benchmark a deployment with a representational workload for your use case.

Understanding the provisioned throughput purchase model

Unlike Azure services where you're charged based on usage, the Azure OpenAI Provisioned Throughput feature is purchased as a renewable, monthly commitment. This commitment is charged to your subscription upon creation and at each monthly renewal. When you onboard to Provisioned Throughput, you need to create a commitment on each Azure OpenAI resource where you intend to create a provisioned deployment. The PTUs you purchase in this way are available for use when creating deployments on those resources.

The total number of PTUs you can purchase via commitments is limited to the amount of Provisioned Throughput quota that is assigned to your subscription. The following table compares other characteristics of Provisioned Throughput quota (PTUs) and Provisioned Throughput commitments.

Topic Quota Commitments
Purpose Grants permission to create provisioned deployments, and provides the upper limit on the capacity that can be used Purchase vehicle for Provisioned Throughput capacity
Lifetime Quota might be removed from your subscription if it isn't purchased via a commitment within five days of being granted The minimum term is one month, with customer-selectable autorenewal behavior. A commitment isn't cancelable, and can't be moved to a new resource while it's active
Scope Quota is specific to a subscription and region, and is shared across all Azure OpenAI resources Commitments are an attribute of an Azure OpenAI resource, and are scoped to deployments within that resource. A subscription might contain as many active commitments as there are resources.
Granularity Quota is granted specific to a model family (for example, GPT-4) but is shareable across model versions within the family Commitments aren't model or version specific. For example, a resource’s 1000 PTU commitment can cover deployments of both GPT-4 and GPT-35-Turbo
Capacity guarantee Having quota doesn't guarantee that capacity is available when you create the deployment Capacity availability to cover committed PTUs is guaranteed as long as the commitment is active.
Increases/Decreases New quota can be requested and approved at any time, independent of your commitment renewal dates The number of PTUs covered by a commitment can be increased at any time, but can't be decreased except at the time of renewal.

Quota and commitments work together to govern the creation of deployments within your subscriptions. To create a provisioned deployment, two criteria must be met:

  • Quota must be available for the desired model within the desired region and subscription. This means you can't exceed your subscription/region-wide limit for the model.
  • Committed PTUs must be available on the resource where you create the deployment. (The capacity you assign to the deployment is paid-for).

Commitment properties and charging model

A commitment includes several properties.

Property Description When Set
Azure OpenAI Resource The resource hosting the commitment Commitment creation
Committed PTUs The number of PTUs covered by the commitment. Initially set at commitment creation, and can be increased at any time, but not decreased.
Term The term of the commitment. A commitment expires one month from its creation date. The renewal policy defines what happens next. Commitment creation
Expiration Date The expiration date of the commitment. This time of expiration is midnight UTC. Initially, 30 days from creation. However, the expiration date changes if the commitment is renewed.
Renewal Policy There are three options for what to do upon expiration:

- Autorenew: A new commitment term begins for another 30 days at the current number of PTUs
- Autorenew with different settings: This setting is the same as Autorenew, except that the number of PTUs committed upon renewal can be decreased
- Don't autorenew: Upon expiration, the commitment ends and isn't renewed.
Initially set at commitment creation, and can be changed at any time.

Commitment charges

Provisioned Throughput Commitments generate charges against your Azure subscription at the following times:

  • At commitment creation. The charge is computed according to the current monthly PTU rate and the number of PTUs committed. You will receive a single up-front charge on your invoice.

  • At commitment renewal. If the renewal policy is set to autorenew, a new monthly charge is generated based on the PTUs committed in the new term. This charge appears as a single up-front charge on your invoice.

  • When new PTUs are added to an existing commitment. The charge is computed based on the number of PTUs added to the commitment, pro-rated hourly to the end of the existing commitment term. For example, if 300 PTUs are added to an existing commitment of 900 PTUs exactly halfway through its term, there is a charge at the time of the addition for the equivalent of 150 PTUs (300 PTUs pro-rated to the commitment expiration date). If the commitment is renewed, the following month’s charge will be for the new PTU total of 1,200 PTUs.

As long as the number of deployed PTUs in a resource is covered by the resource’s commitment, then you'll only see the commitment charges. However, if the number of deployed PTUs in a resource becomes greater than the resource’s committed PTUs, the excess PTUs will be charged as overage at an hourly rate. Typically, the only way this overage will happen is if a commitment expires or is reduced at its renewal while the resource contains deployments. For example, if a 300 PTU commitment is allowed to expire on a resource that has 300 PTUs deployed, the deployed PTUs is no longer be covered by any commitment. Once the expiration date is reached, the subscription is charged an hourly overage fee based on the 300 excess PTUs.

The hourly rate is higher than the monthly commitment rate and the charges exceed the monthly rate within a few days. There are two ways to end hourly overage charges:

  • Delete or scale-down deployments so that they don’t use more PTUs than are committed.
  • Create a new commitment on the resource to cover the deployed PTUs.

Purchasing and managing commitments

Planning your commitments

Upon receiving confirmation that Provisioned Throughput Unit (PTU) quota is assigned to a subscription, you must create commitments on the target resources (or extend existing commitments) to make the quota usable for deployments.

Prior to creating commitments, plan how the provisioned deployments will be used and which Azure OpenAI resources will host them. Commitments have a one month minimum term and can't be decreased in size until the end of the term. They also can't be moved to new resources once created. Finally, the sum of your committed PTUs can't be greater than your quota – PTUs committed on a resource are no longer available to commit to on a different resource until the commitment expires. Having a clear plan on which resources will be used for provisioned deployments and the capacity you intend to apply to them (for at least a month) will help ensure an optimal experience with your provisioned throughput setup.

For example:

  • Don’t create a commitment and deployment on a temporary resource for the purpose of validation. You’ll be locked into using that resource for at least month. Instead, if the plan is to ultimately use the PTUs on a production resource, create the commitment and test deployment on that resource right from the start.

  • Calculate the number of PTUs to commit on a resource based on the number, model, and size of the deployments you intend to create, keeping in mind the minimum number of PTUs each model requires create a deployment.

    • Example 1: GPT-4-32K requires a minimum of 200 PTUs to deploy. If you create a commitment of only 100 PTUs on a resource, you won’t have enough committed PTUs to deploy GPT-4-32K there

    • Example 2: If you need to create multiple deployments on a resource, sum the PTUs required for each deployment. A production resource hosting deployments for 300 PTUs of GPT-4, and 500 PTUs of GPT-4-32K will require a commitment of at least 800 PTUs to cover both deployments.

  • Distribute or consolidate PTUs as needed. For example, total quota of 1000 PTUs can be distributed across resources as needed to support your deployments. It could be committed on a single resource to support one or more deployments adding up to 1000 PTUs, or distributed across multiple resources (for example, a dev and a prod resource) as long as the total number of committed PTUs is less than or equal to the quota of 1000.

  • Consider operational requirements in your plan. For example:

    • Organizationally required resource naming conventions
    • Business continuity policies that require multiple deployments of a model per region, perhaps on different Azure OpenAI resources

Managing Provisioned Throughput Commitments

Provisioned throughput commitments are created and managed from the Manage Commitments view in Azure OpenAI Studio. You can navigate to this view by selecting Manage Commitments from the Quota pane:

Screenshot of commitment purchase UI with notifications.

From the Manage Commitments view, you can do several things:

  • Purchase new commitments or edit existing commitments.
  • Monitor all commitments in your subscription.
  • Identify and take action on commitments that might cause unexpected billing.

The sections below will take you through these tasks.

Purchase a Provisioned Throughput Commitment

With your commitment plan ready, the next step is to create the commitments. Commitments are created manually via Azure OpenAI Studio and require the user creating the commitment to have either the Contributor or Cognitive Services Contributor role at the subscription level.

For each new commitment you need to create, follow these steps:

  1. Launch the Provisioned Throughput purchase dialog by selecting Quotas > Provisioned > Manage Commitments.

Screenshot of the purchase dialog.

  1. Select Purchase commitment.

  2. Select the Azure OpenAI resource and purchase the commitment. You will see your resources divided into resources with existing commitments, which you can edit and resources that don't currently have a commitment.

Setting Notes
Select a resource Choose the resource where you'll create the provisioned deployment. Once you have purchased the commitment, you will be unable to use the PTUs on another resource until the current commitment expires.
Select a commitment type Select Provisioned. (Provisioned is equivalent to Provisioned Managed)
Current uncommitted provisioned quota The number of PTUs currently available for you to commit to this resource.
Amount to commit (PTU) Choose the number of PTUs you're committing to. This number can be increased during the commitment term, but can't be decreased. Enter values in increments of 50 for the commitment type Provisioned.
Commitment tier for current period The commitment period is set to one month.
Renewal settings Auto-renew at current PTUs
Auto-renew at lower PTUs
Do not auto-renew
  1. Select Purchase. A confirmation dialog will be displayed. After you confirm, your PTUs will be committed, and you can use them to create a provisioned deployment. |

Screenshot of commitment purchase UI.

Important

A new commitment is billed up-front for the entire term. If the renewal settings are set to auto-renew, then you will be billed again on each renewal date based on the renewal settings.

Edit an existing Provisioned Throughput commitment

From the Manage Commitments view, you can also edit an existing commitment. There are two types of changes you can make to an existing commitment:

  • You can add PTUs to the commitment.
  • You can change the renewal settings.

To edit a commitment, select the current to edit, then select Edit commitment.

Adding Provisioned Throughput Units to existing commitments

Adding PTUs to an existing commitment will allow you to create larger or more numerous deployments within the resource. You can do this at any time during the term of your commitment.

Screenshot of commitment purchase UI with an increase in the amount to commit value.

Important

When you add PTUs to a commitment, they will be billed immediately, at a pro-rated amount from the current date to the end of the existing commitment term. Adding PTUs does not reset the commitment term.

Changing renewal settings

Commitment renewal settings can be changed at any time before the expiration date of your commitment. Reasons you might want to change the renewal settings include ending your use of provisioned throughput by setting the commitment to not auto-renew, or to decrease usage of provisioned throughput by lowering the number of PTUs that will be committed in the next period.

Important

If you allow a commitment to expire or decrease in size such that the deployments under the resource require more PTUs than you have in your resource commitment, you will receive hourly overage charges for any excess PTUs. For example, a resource that has deployments that total 500 PTUs and a commitment for 300 PTUs will generate hourly overage charges for 200 PTUs.

Monitor commitments and prevent unexpected billings

The manage commitments pane provides a subscription wide overview of all resources with commitments and PTU usage within a given Azure Subscription. Of particular importance interest are:

  • PTUs Committed, Deployed and Usage – These figures provide the sizes of your commitments, and how much is in use by deployments. Maximize your investment by using all of your committed PTUs.
  • Expiration policy and date - The expiration date and policy tell you when a commitment will expire and what will happen when it does. A commitment set to auto-renew will generate a billing event on the renewal date. For commitments that are expiring, be sure you delete deployments from these resources prior to the expiration date to prevent hourly overage billingThe current renewal settings for a commitment.
  • Notifications - Alerts regarding important conditions like unused commitments, and configurations that might result in billing overages. Billing overages can be caused by situations such as when a commitment has expired and deployments are still present, but have shifted to hourly billing.

Common Commitment Management Scenarios

Discontinue use of provisioned throughput

To end use of provisioned throughput, and prevent hourly overage charges after commitment expiration, stop any charges after the current commitments are expired, two steps must be taken:

  1. Set the renewal policy on all commitments to Don't autorenew.
  2. Delete the provisioned deployments using the quota.

Move a commitment/deployment to a new resource in the same subscription/region

It isn't possible in Azure OpenAI Studio to directly move a deployment or a commitment to a new resource. Instead, a new deployment needs to be created on the target resource and traffic moved to it. There will need to be a commitment purchased established on the new resource to accomplish this. Because commitments are charged up-front for a 30-day period, it's necessary to time this move with the expiration of the original commitment to minimize overlap with the new commitment and “double-billing” during the overlap.

There are two approaches that can be taken to implement this transition.

Option 1: No-Overlap Switchover

This option requires some downtime, but requires no extra quota and generates no extra costs.

Steps Notes
Set the renewal policy on the existing commitment to expire This will prevent the commitment from renewing and generating further charges
Before expiration of the existing commitment, delete its deployment Downtime will start at this point and will last until the new deployment is created and traffic is moved. You'll minimize the duration by timing the deletion to happen as close to the expiration date/time as possible.
After expiration of the existing commitment, create the commitment on the new resource Minimize downtime by executing this and the next step as soon after expiration as possible.
Create the deployment on the new resource and move traffic to it

Option 2: Overlapped Switchover

This option has no downtime by having both existing and new deployments live at the same time. This requires having quota available to create the new deployment, and will generate extra costs for the duration of the overlapped deployments.

Steps Notes
Set the renewal policy on the existing commitment to expire Doing so prevents the commitment from renewing and generating further charges.
Before expiration of the existing commitment:
1. Create the commitment on the new resource.
2. Create the new deployment.
3. Switch traffic
4. Delete existing deployment
Ensure you leave enough time for all steps before the existing commitment expires, otherwise overage charges will be generated (see next section) for options.

If the final step takes longer than expected and will finish after the existing commitment expires, there are three options to minimize overage charges.

  • Take downtime: Delete the original deployment then complete the move.
  • Pay overage: Keep the original deployment and pay hourly until you have moved traffic off and deleted the deployment.
  • Reset the original commitment to renew one more time. This will give you time to complete the move with a known cost.

Both paying for an overage and resetting the original commitment will generate charges beyond the original expiration date. Paying overage charges might be cheaper than a new one-month commitment if you only need a day or two to complete the move. Compare the costs of both options to find the lowest-cost approach.

Move the deployment to a new region and or subscription

The same approaches apply in moving the commitment and deployment within the region, except that having available quota in the new location will be required in all cases.

View and edit an existing resource

In Azure OpenAI Studio, select Quota > Provisioned > Manage commitments and select a resource with an existing commitment to view/change it.

Next steps