Edit

Share via


Import an OpenAI-compatible Google Gemini API

APPLIES TO: All API Management tiers

This article shows you how to import an OpenAI-compatible Google Gemini API to access models such as gemini-2.5-flash-lite. For these models, Azure API Management can manage an OpenAI-compatible chat completions endpoint.

Learn more about managing AI APIs in API Management:

Prerequisites

Import an OpenAI-compatible Gemini API by using the portal

  1. In the Azure portal, go to your API Management instance.

  2. In the left menu, under APIs, select APIs > + Add API.

  3. Under Define a new API, select Language Model API.

    Screenshot of creating a passthrough language model API in the portal.

  4. On the Configure API tab:

    1. Enter a Display name and optional Description for the API.

    2. In URL, enter the following base URL from the Gemini OpenAI compatibility documentation: https://generativelanguage.googleapis.com/v1beta/openai

    3. In Path, append a path that your API Management instance uses to route requests to the Gemini API endpoints.

    4. In Type, select Create OpenAI API.

    5. In Access key, enter the following:

      1. Header name: Authorization.
      2. Header value (key): Bearer followed by your API key for the Gemini API.

    Screenshot of importing a Gemini LLM API in the portal.

  5. On the remaining tabs, optionally configure policies to manage token consumption, semantic caching, and AI content safety. For details, see Import a language model API.

  6. Select Review.

  7. After the portal validates the settings, select Create.

API Management creates the API and configures the following:

  • A backend resource and a set-backend-service policy that direct API requests to the Google Gemini endpoint.
  • Access to the LLM backend by using the Gemini API key you provided. API Management protects the key as a secret named value.
  • (optionally) Policies to help you monitor and manage the API.

Test Gemini model

After importing the API, you can test the chat completions endpoint for the API.

  1. Select the API that you created in the previous step.

  2. Select the Test tab.

  3. Select the POST Creates a model response for the given chat conversation operation, which is a POST request to the /chat/completions endpoint.

  4. In the Request body section, enter the following JSON to specify the model and an example prompt. In this example, the gemini-2.5-flash-lite model is used.

    {
        "model": "gemini-2.5-flash-lite",
        "messages": [
            {
                "role": "system",
                "content": "You are a helpful assistant"
            },
            {
                "role": "user",
                "content": "How are you?"
            }
        ],
        "max_tokens": 50
    }
    

    When the test succeeds, the backend responds with a successful HTTP response code and some data. The response includes token usage data to help you monitor and manage your language model token consumption.

    Screenshot of testing a Gemini LLM API in the portal.