deploy Azure OpenAI as a web app

凯旋 李 0 Reputation points
2025-06-13T08:06:16.2633333+00:00

I would like to deploy Azure OpenAI as a web app, and use APIM to enforce quota limits for users based on their UPNs. How should this be implemented?

Azure API Management
Azure API Management
An Azure service that provides a hybrid, multi-cloud management platform for APIs.
2,447 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Suresh Chikkam 2,135 Reputation points Microsoft External Staff Moderator
    2025-06-25T10:26:00.7566667+00:00

    凯旋 李, you can turn the Azure OpenAI endpoint into a simple Web App that just forwards requests, then put API Management in front to check each caller’s Azure AD token, grab their UPN from the token, and count how many “tokens” (text chunks) they use. When someone goes over your limit, APIM will stop them before it ever reaches OpenAI.

    First, wrap OpenAI in a proxy Web App. You can start with the official Azure AI Foundry sample or build a tiny ASP .NET Core Web API that reads two settings (your OpenAI endpoint and key) and simply relays any JSON you post to /chat/completions on the OpenAI REST API. Deploy that to App Service and make sure it lives at, for example, https://my-openai-proxy.azurewebsites.net.

    Next, set up Azure AD so that users sign in and your client (whether it’s MSAL.js in a browser or Azure.Identity in server code) can request a token for the scope you exposed on your APIM API, for example api://<APIM-API-ID>/access_as_user. Every time you call your proxy, attach that token in the header.

    var credential = new InteractiveBrowserCredential();
    var token    = await credential.GetTokenAsync(
                      new TokenRequestContext(new[]{ "api://<API-ID>/.default" }));
    httpClient.DefaultRequestHeaders.Authorization =
        new AuthenticationHeaderValue("Bearer", token.Token);
    var result   = await httpClient.PostAsync(
                      "https://<your-apim>.azure-api.net/openai/chat/completions",
                      new StringContent(jsonPayload, Encoding.UTF8, "application/json"));
    

    With the Web App ready, import it into API Management as a backend API. In the Inbound section of the APIM policy, first validate the bearer token against the tenant and audience scope then pull out the user’s upn claim then apply an llm-token-limit policy keyed on that UPN so APIM keeps a running tally of how many tokens each user spends. Finally, forward the request to the proxy.

    <policies>
      <inbound>
        <base />
        <validate-jwt header-name="Authorization"
                      require-scheme="Bearer"
                      output-token-variable-name="jwt">
          <openid-config url="https://login.microsoftonline.com/<TENANT_ID>/v2.0/.well-known/openid-configuration"/>
          <audiences>
            <audience>api://<API-ID></audience>
          </audiences>
          <required-claims>
            <claim name="scp" match="any">
              <value>access_as_user</value>
            </claim>
          </required-claims>
        </validate-jwt>
        <set-variable name="userUpn"
                      value="@(context.Principal.GetClaimValue("upn")
                               ?? context.Principal.GetClaimValue("preferred_username"))" />
        <llm-token-limit tokens-per-minute="10000"
                         token-quota="4000"
                         token-quota-period="Hourly"
                         counter-key="@(context.Variables["userUpn"])"
                         estimate-prompt-tokens="true"
                         remaining-quota-tokens-header-name="x-remaining-tokens"
                         tokens-consumed-header-name="x-consumed-tokens" />
        <set-backend-service id="openai-proxy"
                             backend-id="your-openai-proxy-service" />
      </inbound>
      <backend>
        <forward-request />
      </backend>
      <outbound>
        <base />
      </outbound>
      <on-error>
        <base />
      </on-error>
    </policies>
    

    When you call the APIM URL with a valid bearer token, you will see x-remaining-tokens and x-consumed-tokens in the response headers. If a user spends more than 4 000 tokens in an hour, APIM automatically rejects further requests (429 or 403) until the next hour. No extra keys or manual subscriptions are needed just the users’ Azure AD identities.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.