Cohere-embed-v3-multilingual model was working for a couple of weeks, then suddenly stopped working.

Bishal Upadhyaya 0 Reputation points
2025-04-29T16:59:55.62+00:00

My application was successfully using the Azure AI Embeddings client with Cohere multilingual embedding models for the last couple of weeks. This week, without any code changes on our end, the same code started returning JSONDecodeErrors.

Our implementation closely mirrors the official Azure example:


    def get_embedding(
        text: Union[str, List[str]],
        model: str = "Cohere-embed-v3-multilingual",
    ) -> np.ndarray:
        """
        Generate embeddings for a single string or a list of strings.
        Returns:
            • np.ndarray shape (dim,)  if input was str
            • np.ndarray shape (N, dim) if input was list[str]
        """
        

        # ── 1. Normalize input ─────────────────────────────────────────────
        is_single = isinstance(text, str)
        inputs = [text] if is_single else text

        # ── 2. Shared key from secret manager ──────────────────────────────
        key = get_secret("vibeset/azure_ai_foundry")["api_key_o3"]

        if 1: #try:
            # ── 3. Cohere path ─────────────────────────────────────────────
            if model == "Cohere-embed-v3-multilingual":


                client = EmbeddingsClient(
                    endpoint="https://kevin-m86fxp36-eastus2.services.ai.azure.com/models",
                    credential=AzureKeyCredential(key)
                )
                resp = client.embed(input=inputs, model=model)
                print(f'resp: {resp}')
                embeddings = [item.embedding for item in resp.data]


            arr = np.asarray(embeddings, dtype=np.float32)
            return arr[0] if is_single else arr

This is the exact example provided in the Azure documentation:

import os

from azure.ai.inference import EmbeddingsClient

endpoint = "https://kevin-m86fxp36-eastus2.services.ai.azure.com/models"
model_name = "Cohere-embed-v3-multilingual"

client = EmbeddingsClient(
    endpoint=endpoint,
    credential=AzureKeyCredential(key)
)

response = client.embed(
    input=["大家好"],
    model=model_name
)

for item in response.data:
    length = len(item.embedding)
    print(
        f"data[{item.index}]: length={length}, "
        f"[{item.embedding[0]}, {item.embedding[1]}, "
        f"..., {item.embedding[length-2]}, {item.embedding[length-1]}]"
    )
print(response.usage)

When running our test code, we now receive:

Requesting embedding for: '大家好' using Cohere…
Traceback (most recent call last):
  File "/Users/bisnu/Projects/vibeset_2/utils/gai_utils.py", line 280, in <module>
    embedding = gai.get_embedding(
  File "/Users/bisnu/Projects/vibeset_2/utils/gai_utils.py", line 197, in get_embedding
    resp = client.embed(input=inputs, model=model)
  File "/Users/bisnu/Projects/vibeset_2/vibe-venv/lib/python3.9/site-packages/azure/core/tracing/decorator.py", line 105, in wrapper_use_tracer
    return func(*args, **kwargs)
  File "/Users/bisnu/Projects/vibeset_2/vibe-venv/lib/python3.9/site-packages/azure/ai/inference/_patch.py", line 1041, in embed
    _models._patch.EmbeddingsResult, response.json()  # pylint: disable=protected-access
  File "/Users/bisnu/Projects/vibeset_2/vibe-venv/lib/python3.9/site-packages/azure/core/rest/_http_response_impl.py", line 331, in json
    self._json = loads(self.text())
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/json/__init__.py", line 346, in loads
    return _default_decoder.decode(s)
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

Things We've Tried

  1. Using the exact endpoint format with "/models" suffix as shown in the Azure example
  2. Trying different model name formats
  3. Passing the model parameter in different ways

None of these solutions work, even though our original code was working for weeks prior to this sudden issue.

Questions

  1. Have there been any recent changes to the Azure AI Foundry API or endpoint structure for Cohere models?
  2. Is the "/models" suffix still correct for the endpoint URL?
  3. Is the model name "Cohere-embed-v3-multilingual" still correct?
  4. Are there any additional configuration requirements that have been introduced recently?
Azure Machine Learning
Azure Machine Learning
An Azure machine learning service for building and deploying models.
3,337 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Jose Ignacio Lorenzo 0 Reputation points
    2025-05-12T19:39:22.44+00:00

    So, I've been encountering the same issues using Cohere V3 embed english model. It seems they have lost support for the AI Inference API (you know, the /embeddings endpoint).
    Basically, you are getting a JSON decode error because the endpoint is returning a response with a 200 status code but an empty body.
    I found out, (by trial and error) that the Cohere V1 API still works!
    So, what you need to do is instead of using the AI inference library, use Cohere's library:

    import cohere
    
    model_name = "Cohere-embed-v3-multilingual"
    text = "Hello world!"
    
    cohere_client = cohere.Client(api_key="<your api key>", base_url="<your base url with /models>")
    
    response = cohere_client.embed(model=model_name, texts=[text], input_type="search document")
    
    embeddings = response.embeddings[0]
    

    When using the cohere library, input_type is required. Read more about it here.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.