How to get a streaming response from LLAMA2 deployed on an online endpoint?

Marcel Bosse 10 Reputation points
2023-10-10T11:27:21.23+00:00

Dear Community,

I have deployed LLAMA2 on an online endpoint in Azure ML. With the example there to use the endpoint I don't get a streamed response back, meaning I always have to wait for the full response.

So my question is, how do I get a streamed response back from my endpoint?

As it is for example on the ChatGPT page, if you ask a question there, then the answer is also generated word for word.

Thanks for your help

Azure Machine Learning
Azure Machine Learning
An Azure machine learning service for building and deploying models.
3,335 questions
{count} votes

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.