deploy Azure OpenAI as a web app

Question

deploy Azure OpenAI as a web app

凯旋李 0

I would like to deploy Azure OpenAI as a web app, and use APIM to enforce quota limits for users based on their UPNs. How should this be implemented?

santoshkc 15,325 Reputation points Microsoft External Staff Moderator

2025-06-13T10:24:44.9066667+00:00

Hi @凯旋李,

Microsoft provides a customizable web app for interacting with Azure OpenAI, which can be deployed via the Azure AI Foundry portal, Azure Portal, or Azure Developer CLI. It includes built-in support for Microsoft Entra ID authentication, Azure AI Search integration, conversation history in Cosmos DB, and UI customization through environment variables, all without writing code.

This app is ideal for quickly enabling users to chat with their data using Azure OpenAI, especially for retrieval-augmented generation (RAG) use cases. It’s currently in preview, and runs on Azure App Service.

Deploy Azure OpenAI web app (official guide).

To enforce quota limits per user using Azure API Management (APIM), ensure the app uses Microsoft Entra ID so that each request includes the user’s User Principal Name (UPN), typically in the token or request header. In APIM, you can extract the UPN and apply the quota-by-key policy to enforce usage limits per user (e.g., daily or monthly request caps), without needing to modify the backend.

API Management policy reference.

https://learn.microsoft.com/en-us/answers/questions/2282826/regarding-azure-openai-service-optout-form/

Hope it helps

Thank you.
Pavankumar Purilla 8,335 Reputation points Microsoft External Staff Moderator

2025-06-16T05:26:56.9633333+00:00

Hi @凯旋李,
Did you get any chance to check the response. Thank you!
凯旋李 0 Reputation points

2025-06-16T09:09:23.99+00:00

Could we have a meeting for some guidance? I am integrating a web app with an Azure OpenAI API endpoint that is exposed via Azure API Management (APIM) and has policy restrictions in place. The web app uses Azure AD authentication. However, the APIM policy (validate-jwt) is unable to retrieve the user's information from the JWT token.
Loknathsatyasaivarma Mahali 2,740 Reputation points Microsoft External Staff Moderator

2025-06-16T14:30:56.6466667+00:00
Hello @凯旋李,

Thanks for bringing back your concerns to us, could you please follow the below steps, and could you please share the update with us so that we can proceed accordingly.

Step 1: First navigate to your API Management instance and navigate to your API and click on Validate JWT as shown below

Step 2: Then configure the Validate JWT as shown in the below image

If you click on the code view you will find this line <validate-jwt header-name="Authorization" failed-validation-httpcode="401" />, Instead of that line please paste the below code:

<validate-jwt header-name="Authorization" require-scheme="Bearer" output-token-variable-name="jwt" failed-validation-httpcode="401" failed-validation-error-message="Unauthorized. Access token is missing or invalid."> <openid-config url="https://login.microsoftonline.com/<TENANT-ID>/v2.0/.well-known/openid-configuration" /> <audiences> <audience>api://<CLIENT-ID></audience> <audience><CLIENT-ID></audience> </audiences> <required-claims> <claim name="scp" match="any"> <value>access_as_user</value>  </claim> </required-claims> </validate-jwt>

Replace <TENANT-ID> and <CLIENT-ID> with values from your App Registration.

You can get those values from your App Registration as show below:
凯旋李 0 Reputation points

2025-06-16T14:55:57.88+00:00

凯旋李 0

This page is where I write the APIM endpoint into the web app environment variables, and the backend of APIM is Azure OpenAI, with relevant token quota restrictions. Accessing the web app using AAD login still results in a 401 error. Should I write code into my application?

<policies>
    <inbound>
        <base />
        <!-- 设置后端服务-->
        <validate-jwt header-name="Authorization" failed-validation-httpcode="401" failed-validation-error-message="Unauthorized. Access token is missing or invalid." require-scheme="Bearer" output-token-variable-name="jwt">
            <openid-config url="https://login.microsoftonline.com/common/.well-known/openid-configuration" />
            <audiences>
                <audience>api://xxxxx</audience>
                <audience>xxxxxx</audience>
            </audiences>
            <required-claims>
                <claim name="scp" match="any">
                    <value>api://xxxxxxx</value>
                </claim>
            </required-claims>
        </validate-jwt>
        <set-backend-service id="apim-generated-policy" backend-id="likxaitest1-ai-endpoint" />
        <set-variable name="userUpn" value="@(context.Request.Headers.GetValueOrDefault("Authorization", string.Empty).AsJwt()?.Claims.GetValueOrDefault("upn") ?? "anonymous")" />
        <!-- 调试用头，可去掉 -->
        <set-header name="X-User-Upn" exists-action="override">
            <value>@((string)context.Variables["userUpn"])</value>
        </set-header>
        <!-- 特定用户限制  -->
        <choose>
            <when condition="@((string)context.Variables["userUpn"] == "******@xxxxx.onmicrosoft.com")">
                <llm-token-limit remaining-quota-tokens-header-name="remaining-tokens" tokens-per-minute="10000" token-quota="1000" token-quota-period="Daily" counter-key="@((string)context.Variables["userUpn"])" estimate-prompt-tokens="true" tokens-consumed-header-name="consumed-tokens" />
            </when>
            <otherwise>
                <!-- 其它用户不限制/可自定义限制 -->
                <llm-token-limit remaining-quota-tokens-header-name="remaining-tokens" tokens-per-minute="10000" token-quota="4000" token-quota-period="Hourly" counter-key="@((string)context.Variables["userUpn"])" estimate-prompt-tokens="true" tokens-consumed-header-name="consumed-tokens" />
            </otherwise>
        </choose>
    </inbound>
    <backend>
        <base />
    </backend>
    <outbound>
        <base />
    </outbound>
    <on-error>
        <base />
    </on-error>
</policies>

Loknathsatyasaivarma Mahali 2,740 Reputation points Microsoft External Staff Moderator

2025-06-16T15:10:25.5233333+00:00

Hello 凯旋李,

Could you please once check the private message and provide us with the requested details?
Here is the reference link on How to access & Data Privacy policy of private messages in Microsoft Q&A.

1 answer

Your answer

santoshkc 15,325 Reputation points Microsoft External Staff Moderator

2025-06-13T10:24:44.9066667+00:00

Hi @凯旋李,

Microsoft provides a customizable web app for interacting with Azure OpenAI, which can be deployed via the Azure AI Foundry portal, Azure Portal, or Azure Developer CLI. It includes built-in support for Microsoft Entra ID authentication, Azure AI Search integration, conversation history in Cosmos DB, and UI customization through environment variables, all without writing code.

This app is ideal for quickly enabling users to chat with their data using Azure OpenAI, especially for retrieval-augmented generation (RAG) use cases. It’s currently in preview, and runs on Azure App Service.

Deploy Azure OpenAI web app (official guide).

To enforce quota limits per user using Azure API Management (APIM), ensure the app uses Microsoft Entra ID so that each request includes the user’s User Principal Name (UPN), typically in the token or request header. In APIM, you can extract the UPN and apply the quota-by-key policy to enforce usage limits per user (e.g., daily or monthly request caps), without needing to modify the backend.

API Management policy reference.

https://learn.microsoft.com/en-us/answers/questions/2282826/regarding-azure-openai-service-optout-form/

Hope it helps

Thank you.
Pavankumar Purilla 8,335 Reputation points Microsoft External Staff Moderator

2025-06-16T05:26:56.9633333+00:00

Hi @凯旋李,
Did you get any chance to check the response. Thank you!
凯旋李 0 Reputation points

2025-06-16T09:09:23.99+00:00

Could we have a meeting for some guidance? I am integrating a web app with an Azure OpenAI API endpoint that is exposed via Azure API Management (APIM) and has policy restrictions in place. The web app uses Azure AD authentication. However, the APIM policy (validate-jwt) is unable to retrieve the user's information from the JWT token.
Loknathsatyasaivarma Mahali 2,740 Reputation points Microsoft External Staff Moderator

2025-06-16T14:30:56.6466667+00:00

Hello @凯旋李,

Thanks for bringing back your concerns to us, could you please follow the below steps, and could you please share the update with us so that we can proceed accordingly.

Step 1: First navigate to your API Management instance and navigate to your API and click on Validate JWT as shown below

Step 2: Then configure the Validate JWT as shown in the below image

If you click on the code view you will find this line <validate-jwt header-name="Authorization" failed-validation-httpcode="401" />, Instead of that line please paste the below code:

<validate-jwt header-name="Authorization" require-scheme="Bearer" output-token-variable-name="jwt" failed-validation-httpcode="401" failed-validation-error-message="Unauthorized. Access token is missing or invalid."> <openid-config url="https://login.microsoftonline.com/<TENANT-ID>/v2.0/.well-known/openid-configuration" /> <audiences> <audience>api://<CLIENT-ID></audience> <audience><CLIENT-ID></audience> </audiences> <required-claims> <claim name="scp" match="any"> <value>access_as_user</value>  </claim> </required-claims> </validate-jwt>

Replace <TENANT-ID> and <CLIENT-ID> with values from your App Registration.

You can get those values from your App Registration as show below:
Loknathsatyasaivarma Mahali 2,740 Reputation points Microsoft External Staff Moderator

2025-06-16T15:10:25.5233333+00:00

Hello 凯旋李,

Could you please once check the private message and provide us with the requested details?
Here is the reference link on How to access & Data Privacy policy of private messages in Microsoft Q&A.

Answer 1

凯旋李, you can turn the Azure OpenAI endpoint into a simple Web App that just forwards requests, then put API Management in front to check each caller’s Azure AD token, grab their UPN from the token, and count how many “tokens” (text chunks) they use. When someone goes over your limit, APIM will stop them before it ever reaches OpenAI.

First, wrap OpenAI in a proxy Web App. You can start with the official Azure AI Foundry sample or build a tiny ASP .NET Core Web API that reads two settings (your OpenAI endpoint and key) and simply relays any JSON you post to /chat/completions on the OpenAI REST API. Deploy that to App Service and make sure it lives at, for example, https://my-openai-proxy.azurewebsites.net.

Next, set up Azure AD so that users sign in and your client (whether it’s MSAL.js in a browser or Azure.Identity in server code) can request a token for the scope you exposed on your APIM API, for example api://<APIM-API-ID>/access_as_user. Every time you call your proxy, attach that token in the header.

var credential = new InteractiveBrowserCredential();
var token    = await credential.GetTokenAsync(
                  new TokenRequestContext(new[]{ "api://<API-ID>/.default" }));
httpClient.DefaultRequestHeaders.Authorization =
    new AuthenticationHeaderValue("Bearer", token.Token);
var result   = await httpClient.PostAsync(
                  "https://<your-apim>.azure-api.net/openai/chat/completions",
                  new StringContent(jsonPayload, Encoding.UTF8, "application/json"));

With the Web App ready, import it into API Management as a backend API. In the Inbound section of the APIM policy, first validate the bearer token against the tenant and audience scope then pull out the user’s upn claim then apply an llm-token-limit policy keyed on that UPN so APIM keeps a running tally of how many tokens each user spends. Finally, forward the request to the proxy.

<policies>
  <inbound>
    <base />
    <validate-jwt header-name="Authorization"
                  require-scheme="Bearer"
                  output-token-variable-name="jwt">
      <openid-config url="https://login.microsoftonline.com/<TENANT_ID>/v2.0/.well-known/openid-configuration"/>
      <audiences>
        <audience>api://<API-ID></audience>
      </audiences>
      <required-claims>
        <claim name="scp" match="any">
          <value>access_as_user</value>
        </claim>
      </required-claims>
    </validate-jwt>
    <set-variable name="userUpn"
                  value="@(context.Principal.GetClaimValue("upn")
                           ?? context.Principal.GetClaimValue("preferred_username"))" />
    <llm-token-limit tokens-per-minute="10000"
                     token-quota="4000"
                     token-quota-period="Hourly"
                     counter-key="@(context.Variables["userUpn"])"
                     estimate-prompt-tokens="true"
                     remaining-quota-tokens-header-name="x-remaining-tokens"
                     tokens-consumed-header-name="x-consumed-tokens" />
    <set-backend-service id="openai-proxy"
                         backend-id="your-openai-proxy-service" />
  </inbound>
  <backend>
    <forward-request />
  </backend>
  <outbound>
    <base />
  </outbound>
  <on-error>
    <base />
  </on-error>
</policies>

When you call the APIM URL with a valid bearer token, you will see x-remaining-tokens and x-consumed-tokens in the response headers. If a user spends more than 4 000 tokens in an hour, APIM automatically rejects further requests (429 or 403) until the next hour. No extra keys or manual subscriptions are needed just the users’ Azure AD identities.

Share via

deploy Azure OpenAI as a web app

1 answer

Your answer