A unified Azure platform for creating and managing AI models, agents, and applications with built‑in enterprise security, monitoring, and governance
Hello @Faiz Delvi ,
Welcome to Microsoft Q&A .Thank you for reaching out to us.
The observed pattern is consistent with a potential model-specific inference and runtime condition affecting the Phi-4-mini-instruct deployment path, where the request is accepted but does not proceed to token generation.
Based on the consistent cross-region reproduction and the fact that other models operate correctly within the same subscription, the behavior is unlikely to be related to configuration, authentication, networking or quota limitations.
Quota or throttling scenarios typically result in explicit error responses (such as 429 or 5xx codes), rather than silent execution with zero token generation.
To ensure service continuity, the following alternatives can be used temporarily:
- Phi-4-mini-reasoning for similar workloads
- GPT-based deployments as fallback options
- Optional routing logic to switch models when no completion tokens are generated
The following references might be helpful , please check them out
Azure OpenAI in Microsoft Foundry Models Quotas and Limits - Microsoft Foundry | Microsoft Learn
Please let us know if the response was helpful
Thank you