azure-openai is much slower than openai.com.
yang liu
0
Reputation points
When azure requests data through the chat-complete(stream) interface, although it also returns one token at a time, the interface doesn't return it directly, it looks like it returns a large number of tokens at the same time (even if it's also a stream method) after a lot of tokens have been accumulated in the background, which results in a delay in the first This will cause the first token to take a relatively long time to return
Looking forward to reply
Sign in to answer