Share via

Phi-4-mini-instruct deployment hangs indefinitely with 0 token generation in Azure AI Foundry

Faiz Delvi 0 Reputation points
2026-05-26T19:38:07.0666667+00:00

We are experiencing an issue with Phi-4-mini-instruct deployments in Azure AI Foundry.

Observed behavior:

  • Deployment succeeds successfully

Requests reach the endpoint

Playground stays on "Thinking..." indefinitely

No completion is ever returned

Metrics show:

Requests increasing

  Total token count = 0
  
     Completion token count = 0
     

Regions tested:

East US 2

Sweden Central

Additional findings:

Phi-4-mini-reasoning works correctly in the same subscription/resource

GPT models work correctly

Multiple redeployments tested

API integration is working for other models

This appears to be specific to Phi-4-mini-instruct preview deployments.

Has anyone else experienced this issue, or is there a known backend/runtime problem with Phi-4-mini-instruct currently?

Thank you.We are experiencing an issue with Phi-4-mini-instruct deployments in Azure AI Foundry.

Observed behavior:

Deployment succeeds successfully

Requests reach the endpoint

Playground stays on "Thinking..." indefinitely

No completion is ever returned

Metrics show:

Requests increasing

  Total token count = 0
  
     Completion token count = 0
     

Regions tested:

East US 2

Sweden Central

Additional findings:

Phi-4-mini-reasoning works correctly in the same subscription/resource

GPT models work correctly

Multiple redeployments tested

API integration is working for other models

This appears to be specific to Phi-4-mini-instruct preview deployments.

Has anyone else experienced this issue, or is there a known backend/runtime problem with Phi-4-mini-instruct currently?

Thank you.

Microsoft Foundry
Microsoft Foundry

A unified Azure platform for creating and managing AI models, agents, and applications with built‑in enterprise security, monitoring, and governance


Answer accepted by question author

Karnam Venkata Rajeswari 3,070 Reputation points Microsoft External Staff Moderator
2026-05-26T20:23:06.0066667+00:00

Hello @Faiz Delvi ,

Welcome to Microsoft Q&A .Thank you for reaching out to us.

The observed pattern is consistent with a potential model-specific inference and runtime condition affecting the Phi-4-mini-instruct deployment path, where the request is accepted but does not proceed to token generation.

Based on the consistent cross-region reproduction and the fact that other models operate correctly within the same subscription, the behavior is unlikely to be related to configuration, authentication, networking or quota limitations.

Quota or throttling scenarios typically result in explicit error responses (such as 429 or 5xx codes), rather than silent execution with zero token generation.

To ensure service continuity, the following alternatives can be used temporarily:

  • Phi-4-mini-reasoning for similar workloads
  • GPT-based deployments as fallback options
  • Optional routing logic to switch models when no completion tokens are generated

The following references might be helpful , please check them out

Azure OpenAI in Microsoft Foundry Models Quotas and Limits - Microsoft Foundry | Microsoft Learn

Please let us know if the response was helpful

 

Thank you

Was this answer helpful?

1 person found this answer helpful.
0 comments No comments

1 additional answer

Sort by: Most helpful
  1. kagiyama yutaka 3,415 Reputation points
    2026-05-27T00:12:45.4966667+00:00

    I think that Azure does not list any client‑side fix for Phi‑4‑mini‑instruct returning 0 tokens, and you can send a repro with the request id and time to Azure support.

    Was this answer helpful?

    0 comments No comments

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.