How to get a streaming response from LLAMA2 deployed on an online endpoint?

Question

How to get a streaming response from LLAMA2 deployed on an online endpoint?

Marcel Bosse 10

Dear Community,

I have deployed LLAMA2 on an online endpoint in Azure ML. With the example there to use the endpoint I don't get a streamed response back, meaning I always have to wait for the full response.

So my question is, how do I get a streamed response back from my endpoint?

As it is for example on the ChatGPT page, if you ask a question there, then the answer is also generated word for word.

Thanks for your help

Azar 29,520 Reputation points MVP Volunteer Moderator

2023-10-10T13:13:43.3066667+00:00

Hi @Marcel Bosse

To get a streamed response from an Azure Machine Learning endpoint when using models like LLAMA2, you would typically required to use asynchronous API requests. Streaming responses can be useful when you want to receive and process data in smaller chunks as it becomes available, rather than waiting for the full response.

Hope this helps, if you find this useful kindly accept the answer thanks much.
Azar 29,520 Reputation points MVP Volunteer Moderator

2023-11-03T13:56:26.57+00:00

Hi @Marcel Bosse

Checking if this helped if it did kindly accept the answer thanks much
Shukla, Anuvrat 5 Reputation points

2023-11-06T07:00:03.1166667+00:00

How to check if my API requests asynchronous or not? Also Is there a sample python code to enable streaming? I have to wait for full responses right now.

Your answer

Azar 29,520 Reputation points MVP Volunteer Moderator

2023-10-10T13:13:43.3066667+00:00

Hi @Marcel Bosse

To get a streamed response from an Azure Machine Learning endpoint when using models like LLAMA2, you would typically required to use asynchronous API requests. Streaming responses can be useful when you want to receive and process data in smaller chunks as it becomes available, rather than waiting for the full response.

Hope this helps, if you find this useful kindly accept the answer thanks much.
Azar 29,520 Reputation points MVP Volunteer Moderator

2023-11-03T13:56:26.57+00:00

Hi @Marcel Bosse

Checking if this helped if it did kindly accept the answer thanks much
Shukla, Anuvrat 5 Reputation points

2023-11-06T07:00:03.1166667+00:00

How to check if my API requests asynchronous or not? Also Is there a sample python code to enable streaming? I have to wait for full responses right now.

Share via

How to get a streaming response from LLAMA2 deployed on an online endpoint?

Your answer