API Request Timeout at 2 Minutes for deepseek-r1 Model (azure-ai-inference-1.0.0b8)

Yl 15 Reputation points
2025-02-04T07:24:25.7133333+00:00

Hi, I am experiencing an issue where Azure API requests to the DeepSeek-R1 model are cut off at exactly 2 minutes. I couldn't find any timeout-related configuration on my end resolves the issue. Is there any way to extend this limit for the model?

Setup:

Python 3.10

azure-ai-inference-1.0.0b8

Error code:


client = ChatCompletionsClient(
        endpoint=os.environ.get("AZURE_ENDPOINT"),
        credential=AzureKeyCredential(os.environ.get("AZURE_API_KEY")),
        )
completion = client.complete(
        model=model,
        messages=messages_azure,
        temperature=temperature,
        max_tokens = max_tokens,
)

Error message:

HttpResponseError: (Timeout) The operation was timeout.
Code: Timeout
Message: The operation was timeout.
Azure AI services
Azure AI services
A group of Azure services, SDKs, and APIs designed to make apps more intelligent, engaging, and discoverable.
3,140 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Vikram Singh 1,805 Reputation points Microsoft Employee
    2025-02-05T07:15:23.0333333+00:00

    Hi Yl,

    Azure AI Inference has a hard timeout of 120 seconds, which means that any request exceeding this limit will be automatically cut off. This means that if a model takes longer than 2 minutes to generate a response, the request is automatically terminated by Azure.

    To resolve this, I recommend:

    • Reducing max_tokens (e.g., set max_tokens=800).
    • Enabling response streaming (stream=True) to avoid waiting for the full response.
    • Shortening the input prompt to minimize processing time.

    I hope this is helpful! Do not hesitate to let me know if you have any other questions.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.