Share via

Issue with Azure Openai Embedding service with new AI services.

Axel Vázquez 5 Reputation points
2026-04-29T16:26:38.5766667+00:00

I’ve been actively using Azure AI Foundry models through the new AI Services inference API to support a variety of models.
I recently started using the same endpoint for embeddings:

POST https://{resource}.services.ai.azure.com/models/embeddings?api-version=2024-05-01-preview

This was working correctly until today, with no changes on my side (same payload, headers, and configuration).

As of now:

  • The request returns HTTP 200
  • The response body is empty
  • The response headers include:
    grpc-message: Unrecognized endpoint (/v1/engines/text-embedding-3-small/embeddings...) This suggests a backend routing or regression issue, where an internal endpoint mismatch is occurring despite calling the correct /models/embeddings route. Has there been a recent change or incident affecting the embeddings endpoint in Azure AI Foundry?
Azure OpenAI in Foundry Models
0 comments No comments

2 answers

Sort by: Most helpful
  1. Anshika Varshney 11,060 Reputation points Microsoft External Staff Moderator
    2026-04-30T09:57:11.29+00:00

    Hi Axel Vázquez,

    What you are seeing usually means the request is reaching the service, but the backend is routing it to the wrong internal embedding's handler. The key signal is that you get HTTP 200 with an empty body and a grpc message saying an unrecognized internal embeddings path, even though you called the /models/embeddings route. That pattern points to a service side routing regression or an endpoint format mismatch rather than a payload change on your side. [clemenssiebler.com]

    How to verify you are using the correct embeddings endpoint format:

    Azure currently has more than one valid way to call embeddings, and mixing endpoint styles can lead to confusing results. The safest way to verify is to align your call with the official embedding's patterns documented for Azure OpenAI in Foundry Models. [microsoftl....github.io]

    Option 1 is the OpenAI v1 style embeddings route under your Azure OpenAI endpoint. The documented pattern is POST https://YOUR RESOURCE NAME.openai.azure.com/openai/v1/embeddings?api-version=preview and the body must include model and input.

    Option 2 is the deployment style embeddings route POST https://YOUR RESOURCE NAME.openai.azure.com/openai/deployments/YOUR DEPLOYMENT NAME/embeddings?api-version=2025-04-01-preview and the body must include input.

    If you are calling the newer services.ai.azure.com models embeddings endpoint and you see grpc message Unrecognized endpoint, that is a strong hint the backend is not mapping your call to the expected embeddings handler correctly.

    Key troubleshooting steps for this embeddings issue:

    Step 1 Use the documented embeddings endpoint as a baseline test Try the OpenAI v1 embeddings endpoint shown in the official embeddings documentation and confirm you get a non empty response. This is the fastest way to separate a service routing issue from a request formatting issue.

    Step 2 Confirm authentication matches what the embeddings API supports The Azure OpenAI embeddings v1 API does not currently support Microsoft Entra ID authentication. The documentation recommends using API key authentication for embeddings calls. If you are using Entra ID with the new inference endpoint, testing with API key on the v1 embeddings endpoint helps confirm whether auth style is part of the issue.

    Step 3 Double check the request body fields For embeddings, input must be present and non empty. For the v1 embeddings route, model is also required. If model or input are missing or mismatched, you can get unexpected behavior depending on the gateway.

    Step 4 Validate limits to avoid silent failures The latest embedding models have input length limits and request size limits. Keeping the test input small and simple for the first verification helps rule out limit related behavior. The documentation calls out maximum input length guidance and other best practices for embedding requests.

    Step 5 Compare results across endpoints using the exact same input Run the same input text through the services.ai.azure.com models embeddings endpoint that is returning 200 with empty body and the openai v1 embeddings endpoint If the openai v1 endpoint returns a normal embedding vector while the services.ai.azure.com endpoint returns empty output with grpc unrecognized endpoint, that strongly supports a backend routing issue specific to that endpoint.

    In short: First confirm embeddings work using the documented openai v1 embeddings endpoint and API key. Then compare with the services.ai.azure.com models embeddings endpoint. If only the services.ai.azure.com route produces 200 with an empty body and grpc unrecognized endpoint, it is most likely a service side routing regression on that new endpoint rather than a change you made.

    I Hope this helps. Do let me know if you have any further queries.

    Thankyou!

    Was this answer helpful?


  2. Q&A Assist
    2026-04-29T16:27:09.4333333+00:00
    AI generated content. This question contains an answer created with AI from Q&A Assist. Learn more

    The embeddings behavior described does not match any documented change or incident in the provided material. The context only documents the supported and current ways to call embeddings in Azure AI Foundry / Azure OpenAI, not live service incidents.

    From the documentation, the supported patterns for embeddings are:

    1. Azure OpenAI in Azure AI Foundry (preview v1-style path)
    POST {endpoint}/openai/v1/embeddings?api-version=preview
    
    {
      "model": "text-embedding-ada-002",
      "input": [
        "this is a test"
      ]
    }
    

    Key points:

    • Path: /openai/v1/embeddings
    • model is required in the body.
    • input is required and must respect model limits (for example, text-embedding-ada-002 supports up to 16 inputs and 8,191 tokens total per request).
    1. Azure OpenAI classic deployment-style path (non-Foundry models reference)
    POST https://{endpoint}/openai/deployments/{deployment-id}/embeddings?api-version=2024-10-21
    
    {
      "input": [
        "this is a test"
      ]
    }
    
    1. Foundry Models preview reference (deployment-style path)
    POST https://{endpoint}/openai/deployments/{deployment-id}/embeddings?api-version=2025-04-01-preview
    
    {
      "input": [
        "this is a test"
      ]
    }
    

    And the preview Embeddings - Create contract:

    • Required body fields:
      • input: string or array (must not be empty; must not exceed model token limits).
    • Optional fields:
      • dimensions: only for text-embedding-3 and later.
      • encoding_format: float or base64 (defaults to float).
      • input_type, user.

    Given this documentation, the recommended actions are:

    1. Align the request path with documented endpoints
      • Instead of /models/embeddings, use one of the documented embeddings endpoints, for example:
        • {endpoint}/openai/v1/embeddings?api-version=preview (Foundry v1-style), or
        • {endpoint}/openai/deployments/{deployment-id}/embeddings?api-version=2024-10-21 (or the preview version documented for Foundry models).
      • Ensure the endpoint is the Azure OpenAI / Foundry endpoint as documented (for example, https://{your-resource-name}.openai.azure.com or the Foundry endpoint shown in the portal for the model resource).
    2. Ensure the request body matches the documented schema
      • Include model when using the v1-style /openai/v1/embeddings endpoint.
      • Ensure input is non-empty and within token limits.
      • If using text-embedding-3-small or similar, only use dimensions if supported by that model.
    3. Validate that the model and version support batched inputs
      • For example, the FAQ notes that batching multiple inputs in a single array is only supported for text-embedding-ada-002 Version 2, up to 16 inputs and 8,191 tokens total. Exceeding these limits or using unsupported batching with other models can cause errors.
    4. Check for backend error details
      • The embeddings references describe a default AzureErrorResponse for unexpected errors. If the HTTP status is 200 but the body is empty, this suggests the request might not be reaching the correct embeddings handler (for example, due to an internal routing mismatch). Switching to the documented /openai/.../embeddings paths avoids relying on any non-documented /models/embeddings behavior.

    Because the context does not document /models/embeddings for embeddings nor any recent regression, the safest path forward is to:

    • Move the embedding calls to one of the documented embeddings endpoints above, and
    • Ensure the body and model usage follow the documented constraints.

    If the issue persists even when using the documented /openai/.../embeddings endpoints, it would need to be raised through Azure support as a potential service-side issue, since it is not covered by the available documentation.


    References:

    Was this answer helpful?

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.