Frank Yuan Greetings & Welcome to Microsoft Q&A forum!
I’m trying to use Azure openai’s the latest model(1106-Preview), and the openai python version is 1.3.5, found sometimes the API is very slow (30-40s).
Please note that, we don't recommend using this model in production. We will upgrade all deployments of this model to a future stable version. Models designated preview do not follow the standard Azure OpenAI model lifecycle.
May I know if need to set some parameter in chat method or it is the issue about LLM? Thanks. Looks like the old version (0613) worked fine and no this issue.
It's possible that the slowness you're experiencing is due to the amount of data it needs to process. You can try below suggestions to improve the performance:
- The
max_tokens
parameter controls the maximum number of tokens that the API will generate. If you reduce this number, the API will generate less text and may be faster. - The
temperature
parameter controls the randomness of the generated text. If you increase this number, the API will generate more varied text, which may be faster. - If you're generating a large amount of text, you might try using the
stream
parameter to receive the response in chunks. This can help reduce the amount of memory the API needs to use. - Slow performance could also be due to network issues. You might try checking your network connection to make sure it's not the bottleneck.
Do let me know if that helps or have any further queries.