How to get a streaming response from LLAMA2 deployed on an online endpoint?
Marcel Bosse
10
Reputation points
Dear Community,
I have deployed LLAMA2 on an online endpoint in Azure ML. With the example there to use the endpoint I don't get a streamed response back, meaning I always have to wait for the full response.
So my question is, how do I get a streamed response back from my endpoint?
As it is for example on the ChatGPT page, if you ask a question there, then the answer is also generated word for word.
Thanks for your help
Azure Machine Learning
Azure Machine Learning
An Azure machine learning service for building and deploying models.
3,335 questions
Sign in to answer