Hi Wenjun Che,You can use the Chat Completions API correctly in Azure OpenAI with the following format:
const result = await client.chat.completions.create({
messages: [{ role: "user", content: "Why is the sky blue?" }],
model: "gpt-4o-mini",
max_tokens: 100
});
Since you mentioned that client.chat.completions.create() works fine while client.responses.create() results in a rate limit error, this suggests that Azure manages rate limits differently for these APIs. The Responses API might be consuming tokens differently or facing stricter limitations in AI Foundry.
If possible, I recommend using Chat Completions API as it appears to work without issues. If you must use Responses API, try reducing max_tokens and check your Azure AI Foundry quota and token usage to ensure you're not hitting rate limits.
For more information: https://learn.microsoft.com/en-us/azure/ai-services/openai/supported-languages?tabs=dotnet-secure%2Csecure%2Cpython-secure%2Ccommand&pivots=programming-language-javascript#chat