Edit

Share via


Import a language model API

APPLIES TO: All API Management tiers

You can import OpenAI-compatible language model endpoints to your API Management instance, or import non-compatible models as passthrough APIs. For example, manage self-hosted LLMs or those hosted on inference providers other than Azure AI services. Use AI gateway policies and other API Management capabilities to simplify integration, improve observability, and enhance control over model endpoints.

Learn more about managing AI APIs in API Management:

Language model API types

API Management supports two language model API types. Choose the option that matches your model deployment, which determines how clients call the API and how requests get route to the AI service.

  • OpenAI-compatible - Language model endpoints compatible with OpenAI's API. Examples include Hugging Face Text Generation Inference (TGI) and Google Gemini API.

    API Management configures a chat completions endpoint.

  • Passthrough - Language model endpoints not compatible with OpenAI's API. Examples include models deployed in Amazon Bedrock or other providers.

    API Management configures wildcard operations for common HTTP verbs. Clients can append paths to wildcard operations, and API Management passes requests to the backend.

Prerequisites

Import language model API by using the portal

Importing the LLM API automatically configures:

  • A backend resource and set-backend-service policy that direct requests to the LLM endpoint.
  • (optionally) Access using an access key (protected as a secret named value).
  • (optionally) Policies to monitor and manage the API.

To import a language model API:

  1. In the Azure portal, go to your API Management instance.

  2. In the left menu, under APIs, select APIs > + Add API.

  3. Under Define a new API, select Language Model API.

    Screenshot of creating an OpenAI-compatible API in the portal.

  4. On the Configure API tab:

    1. Enter a Display name and Description (optional).
    2. Enter the LLM API URL.
    3. Select one or more Products to associate with the API (optional).
    4. In Path, append the path to access the LLM API.
    5. Select either Create OpenAI API or Create a passthrough API. See Language model API types.
    6. Enter the authorization header name and API key (if required).
    7. Select Next.

    Screenshot of language model API configuration in the portal.

  5. On the Manage token consumption tab, enter settings or accept defaults for the following policies:

  6. On the Apply semantic caching tab, enter settings or accept defaults for the policy to optimize performance and reduce latency:

  7. On the AI content safety tab, enter settings or accept defaults to configure Azure AI Content Safety to block unsafe content:

  8. Select Review.

  9. After validation, select Create.

API Management creates the API and configures operations for the LLM endpoints. By default, the API requires an API Management subscription.

Test the LLM API

Verify your LLM API in the test console.

  1. Select the API you created.

  2. Select the Test tab.

  3. Select an operation compatible with the model deployment. Fields for parameters and headers appear.

  4. Enter parameters and headers. Depending on the operation, configure or update a Request body as needed.

    Note

    The test console automatically adds an Ocp-Apim-Subscription-Key header (using the built-in all-access subscription), which provides access to every API. To display it, select the "eye" icon next to HTTP Request.

  5. Select Send.

    When the test succeeds, the backend returns data including token usage metrics to monitor language model consumption.