API Request Timeout at 2 Minutes for deepseek-r1 Model (azure-ai-inference-1.0.0b8)

Question

API Request Timeout at 2 Minutes for deepseek-r1 Model (azure-ai-inference-1.0.0b8)

Yl 20

Hi, I am experiencing an issue where Azure API requests to the DeepSeek-R1 model are cut off at exactly 2 minutes. I couldn't find any timeout-related configuration on my end resolves the issue. Is there any way to extend this limit for the model?

Setup:

Python 3.10

azure-ai-inference-1.0.0b8

Error code:


client = ChatCompletionsClient(
        endpoint=os.environ.get("AZURE_ENDPOINT"),
        credential=AzureKeyCredential(os.environ.get("AZURE_API_KEY")),
        )
completion = client.complete(
        model=model,
        messages=messages_azure,
        temperature=temperature,
        max_tokens = max_tokens,
)

Error message:

HttpResponseError: (Timeout) The operation was timeout.
Code: Timeout
Message: The operation was timeout.

Vikram Singh 2,585 Reputation points Microsoft Employee Moderator

2025-02-04T08:51:04.8866667+00:00
Hi Yl,

Thanks for posting your question on Microsoft Q&A.

The azure-ai-inference SDK (v1.0.0b8) does not provide a built-in timeout parameter for ChatCompletionsClient. To address this issue, you can try the following steps:

Client-Side Fix: Extend the timeout using transport settings:

from azure.core.pipeline.transport import RequestsTransport from azure.ai.inference import ChatCompletionsClient client = ChatCompletionsClient( endpoint=os.environ.get("AZURE_ENDPOINT"), credential=AzureKeyCredential(os.environ.get("AZURE_API_KEY")), transport=RequestsTransport(read_timeout=300) # 5-minute timeout )

Server-Side Check: The 2-minute limit may be an Azure-enforced restriction. To resolve:

Contact Azure Support via the Azure Portal → Model Deployment → Quotas.

Provide the Correlation ID from error logs when requesting an increase.

Alternative Workarounds:

Optimize Requests: Reduce max_tokens or simplify prompts.

Use Streaming (if supported) to process partial responses:

completion = client.complete(..., stream=True) for chunk in completion: print(chunk.choices.text) # Process output incrementally

If the reply was helpful, please don't forget to upvote and/or accept it as an answer. Let me know if you have any other queries.

Thank you!
Yl 20 Reputation points

2025-02-04T17:33:38.6933333+00:00

Hi Vikram,

Thank you for your response. As suggested, I tested adding transport=RequestsTransport(read_timeout=300) to the ChatCompletionsClient, but unfortunately, it didn't resolve the issue. The request is still being cut off at exactly 2 minutes.

I couldn't find any documentation on the transport attribute for the ChatCompletionsClient constructor from this source. Can you please point me to the official documentation or confirm if this is a supported parameter?

Thank you.
elv 0 Reputation points

2025-02-05T14:15:02.24+00:00

hey, i was having the sam problem. by changing the code so it continuously receives the output from the model as opposed to a single answer from blob at the end solved it for me, now it goes for as long as it needs.
Vikram Singh 2,585 Reputation points Microsoft Employee Moderator

2025-02-07T04:00:05.8066667+00:00

Hi Yl,

Greetings.

Just following up to check if my suggestion helped. Please do not forget to "Accept the answer” and “up-vote” wherever the information provided helps you, this can be beneficial to other community members.

Thank you

1 answer

Your answer

Vikram Singh 2,585 Reputation points Microsoft Employee Moderator

2025-02-04T08:51:04.8866667+00:00

Hi Yl,

Thanks for posting your question on Microsoft Q&A.

The azure-ai-inference SDK (v1.0.0b8) does not provide a built-in timeout parameter for ChatCompletionsClient. To address this issue, you can try the following steps:

Client-Side Fix: Extend the timeout using transport settings:

from azure.core.pipeline.transport import RequestsTransport from azure.ai.inference import ChatCompletionsClient client = ChatCompletionsClient( endpoint=os.environ.get("AZURE_ENDPOINT"), credential=AzureKeyCredential(os.environ.get("AZURE_API_KEY")), transport=RequestsTransport(read_timeout=300) # 5-minute timeout )

Server-Side Check: The 2-minute limit may be an Azure-enforced restriction. To resolve:

Contact Azure Support via the Azure Portal → Model Deployment → Quotas.

Provide the Correlation ID from error logs when requesting an increase.

Alternative Workarounds:

Optimize Requests: Reduce max_tokens or simplify prompts.

Use Streaming (if supported) to process partial responses:

completion = client.complete(..., stream=True) for chunk in completion: print(chunk.choices.text) # Process output incrementally

If the reply was helpful, please don't forget to upvote and/or accept it as an answer. Let me know if you have any other queries.

Thank you!
Yl 20 Reputation points

2025-02-04T17:33:38.6933333+00:00

Hi Vikram,

Thank you for your response. As suggested, I tested adding transport=RequestsTransport(read_timeout=300) to the ChatCompletionsClient, but unfortunately, it didn't resolve the issue. The request is still being cut off at exactly 2 minutes.

I couldn't find any documentation on the transport attribute for the ChatCompletionsClient constructor from this source. Can you please point me to the official documentation or confirm if this is a supported parameter?

Thank you.
elv 0 Reputation points

2025-02-05T14:15:02.24+00:00

hey, i was having the sam problem. by changing the code so it continuously receives the output from the model as opposed to a single answer from blob at the end solved it for me, now it goes for as long as it needs.
Vikram Singh 2,585 Reputation points Microsoft Employee Moderator

2025-02-07T04:00:05.8066667+00:00

Hi Yl,

Greetings.

Just following up to check if my suggestion helped. Please do not forget to "Accept the answer” and “up-vote” wherever the information provided helps you, this can be beneficial to other community members.

Thank you

Answer 1

Vikram Singh 2,585 Microsoft Employee Moderator

Hi Yl,

Azure AI Inference has a hard timeout of 120 seconds, which means that any request exceeding this limit will be automatically cut off. This means that if a model takes longer than 2 minutes to generate a response, the request is automatically terminated by Azure.

To resolve this, I recommend:

Reducing max_tokens (e.g., set max_tokens=800).
Enabling response streaming (stream=True) to avoid waiting for the full response.
Shortening the input prompt to minimize processing time.

I hope this is helpful! Do not hesitate to let me know if you have any other questions.

anupam 0 Reputation points

2025-02-12T07:51:15.4866667+00:00
Hi

You can probably try AzureAIChatCompletionsModel from langchain_azure_ai.chat_models. They give an option to set the timeout to None.

from langchain_azure_ai.chat_models import AzureAIChatCompletionsModel

llm = AzureAIChatCompletionsModel(

endpoint=os.environ["AZURE_INFERENCE_ENDPOINT"], credential=os.environ["AZURE_INFERENCE_CREDENTIAL"], model_name="DeepSeek-R1", timeout=None # Add this ```)
Vikram Singh 2,585 Reputation points Microsoft Employee Moderator

2025-02-12T12:15:33.8033333+00:00

Hi Yl,

Greetings.

Just following up to check if my suggestion helped or let me know if you are facing the issue. Please do not forget to "Accept the answer” and “up-vote” wherever the information provided helps you, this can be beneficial to other community members.

Thank you
PrashanthGM-2593 0 Reputation points

2025-02-13T19:34:19.53+00:00

we tried all the above mentioned options it didnt help to reduce the latency
Vikram Singh 2,585 Reputation points Microsoft Employee Moderator

2025-02-17T05:26:33.7466667+00:00

Hi Yl,

Greetings.

If the response helped, please do click Accept Answer and Yes for was this answer helpful.

Doing so would help other community members with similar issue identify the solution. I highly appreciate your contribution to the community.

@Prashanth GM: Can posted your question in Microsoft QnA with all the error details so that we can assist you in better way.

Thank You.
Vikram Singh 2,585 Reputation points Microsoft Employee Moderator

2025-02-20T04:38:29.7133333+00:00

Hi Yl,

Greetings.

Please don't forget to close up the thread here by upvoting and accept it as an answer if it is helpful.

Doing so would help other community members with similar issue identify the solution. I highly appreciate your contribution to the community.

Thank You.
Vikram Singh 2,585 Reputation points Microsoft Employee Moderator

2025-02-21T09:12:22.72+00:00

Hi Yl,

Greetings!

We haven’t heard from you on the last response and was just checking back to see if you got a chance to try above suggestions.

Thank you.
Guilherme Felipe Simão 0 Reputation points

2025-02-25T14:00:02.95+00:00

I have the same problem! Any Solution?
Vikram Singh 2,585 Reputation points Microsoft Employee Moderator

2025-02-26T06:25:26.1866667+00:00

Hi Guilherme Felipe Simão

Greeting!

Can posted your question in Microsoft QnA with all the error details so that we can assist you in better way.

Thanks

Share via

API Request Timeout at 2 Minutes for deepseek-r1 Model (azure-ai-inference-1.0.0b8)

1 answer

Your answer