Host LLM Webapp on Azure - What is the way to go?

Question

Host LLM Webapp on Azure - What is the way to go?

Maximilian Weißenbacher (DE) 30

Hi,

I am new to Azure but I want to host a Webapp on Azure. The Webapp is a RAG application and I am using a quantized model from "TheBloke" (Mixtral 8x7B, so I need some GPU power) at the moment and Streamlit as a UI.

Now I am not sure what is the best way to host such a web app. I saw on Azure Machine Learning, that I can use Model Endpoints of Mixtral. However in the model catalogue I wasn't able to find all Huggingface models I have used.

So would it be better to switch to a Virtual Machine and upload there the qunatized models? But I am not sure if the computing power is enough then. Also has someone experience with the cost of a similar application? For now, I only want to use the application for demo purposes, so there will only be a couple of people (<5) who will use the app.

Thanks for suggestions!

Maximilian Weißenbacher (DE) 30 Reputation points

2024-01-11T13:29:07.87+00:00

Thanks for your detailed answer. Out of this answer I think an Azure Virtual Machine makes the most sense for my use case. Which VM would you suggest for hosting LLMs? E.g. TheBloke/mixtral-8x7B-instruct?
YutongTie-MSFT 53,971 Reputation points Moderator

2024-01-16T18:51:22.57+00:00

@Maximilian Weißenbacher (DE) Thanks for your response and sorry for the delay due to the weekend. Some factors you can consider - the size of the LLMs, the number of concurrent requests, and the expected workload.

Based on the information you provided, it is difficult to make a specific recommendation for a VM size. However, I can provide some general guidance on selecting a VM for hosting LLMs.

For hosting LLMs, you will typically want a VM with a high amount of memory and CPU resources to handle the computational demands of the models. You may also want to consider a VM with a GPU if your LLMs require GPU acceleration.

Some VM sizes that may be suitable for hosting LLMs include:

Standard_DS_v2: This VM size is optimized for general-purpose workloads and offers a good balance of CPU and memory resources. It is available with up to 14 GB of memory and 8 vCPUs.

Standard_NC_v3: This VM size is optimized for GPU workloads and offers high-performance NVIDIA GPUs for GPU-accelerated computing. It is available with up to 24 GB of memory and 6 vCPUs.

Standard_E64_v3: This VM size is optimized for memory-intensive workloads and offers a high amount of memory and CPU resources. It is available with up to 432 GB of memory and 64 vCPUs.

Ultimately, the best VM size for hosting LLMs will depend on the specific requirements of your workload. Please review your scenario to see the best fit : )

I hope this helps. Thanks a lot.

Regards,

Yutong

Accepted answer

0 additional answers

Your answer

Maximilian Weißenbacher (DE) 30 Reputation points

2024-01-11T13:29:07.87+00:00

Thanks for your detailed answer. Out of this answer I think an Azure Virtual Machine makes the most sense for my use case. Which VM would you suggest for hosting LLMs? E.g. TheBloke/mixtral-8x7B-instruct?
YutongTie-MSFT 53,971 Reputation points Moderator

2024-01-16T18:51:22.57+00:00

@Maximilian Weißenbacher (DE) Thanks for your response and sorry for the delay due to the weekend. Some factors you can consider - the size of the LLMs, the number of concurrent requests, and the expected workload.

Based on the information you provided, it is difficult to make a specific recommendation for a VM size. However, I can provide some general guidance on selecting a VM for hosting LLMs.

For hosting LLMs, you will typically want a VM with a high amount of memory and CPU resources to handle the computational demands of the models. You may also want to consider a VM with a GPU if your LLMs require GPU acceleration.

Some VM sizes that may be suitable for hosting LLMs include:

Standard_DS_v2: This VM size is optimized for general-purpose workloads and offers a good balance of CPU and memory resources. It is available with up to 14 GB of memory and 8 vCPUs.

Standard_NC_v3: This VM size is optimized for GPU workloads and offers high-performance NVIDIA GPUs for GPU-accelerated computing. It is available with up to 24 GB of memory and 6 vCPUs.

Standard_E64_v3: This VM size is optimized for memory-intensive workloads and offers a high amount of memory and CPU resources. It is available with up to 432 GB of memory and 64 vCPUs.

Ultimately, the best VM size for hosting LLMs will depend on the specific requirements of your workload. Please review your scenario to see the best fit : )

I hope this helps. Thanks a lot.

Regards,

Yutong

Answer 1

@Maximilian Weißenbacher (DE) Thanks for reaching out to us, there are a few ways to do so you may want to take a look at them and let us know which one you are interested in. As you mentioned - Azure Machine Learning: You can deploy your quantized model as a web service using Azure Machine Learning. This will allow you to use the model endpoint in your web app. However, as you mentioned, not all Huggingface models may be available in the Azure Machine Learning model catalog. Additionally, deploying a model as a web service in Azure Machine Learning can be more complex than other options. Azure Virtual Machines: You can create a virtual machine in Azure and upload your quantized model to the virtual machine. This will give you more control over the environment and allow you to use GPU power if needed. However, you will need to manage the virtual machine yourself, which can be more time-consuming. Azure App Service: You can use Azure App Service to host your web app. This will allow you to deploy your web app quickly and easily, without having to manage the underlying infrastructure. However, you may need to use a different approach to use your quantized model with GPU power, such as using a separate API or service to handle the model.

In terms of cost, the cost of hosting your web app on Azure will depend on a variety of factors, including the size and complexity of your app, the amount of traffic it receives, and the resources it requires. For a demo app with only a few users, the cost should be relatively low. You can use the Azure pricing calculator to estimate the cost of hosting your app on Azure.

I hope this helps, let me know if you have further questions.

Regards, Yutong

-Please kindly accept the answer if you feel helpful to support the community, thanks a lot.

Share via

Host LLM Webapp on Azure - What is the way to go?

0 additional answers

Your answer