@Maximilian Weißenbacher (DE) Thanks for reaching out to us, there are a few ways to do so you may want to take a look at them and let us know which one you are interested in. As you mentioned - Azure Machine Learning: You can deploy your quantized model as a web service using Azure Machine Learning. This will allow you to use the model endpoint in your web app. However, as you mentioned, not all Huggingface models may be available in the Azure Machine Learning model catalog. Additionally, deploying a model as a web service in Azure Machine Learning can be more complex than other options. Azure Virtual Machines: You can create a virtual machine in Azure and upload your quantized model to the virtual machine. This will give you more control over the environment and allow you to use GPU power if needed. However, you will need to manage the virtual machine yourself, which can be more time-consuming. Azure App Service: You can use Azure App Service to host your web app. This will allow you to deploy your web app quickly and easily, without having to manage the underlying infrastructure. However, you may need to use a different approach to use your quantized model with GPU power, such as using a separate API or service to handle the model.
In terms of cost, the cost of hosting your web app on Azure will depend on a variety of factors, including the size and complexity of your app, the amount of traffic it receives, and the resources it requires. For a demo app with only a few users, the cost should be relatively low. You can use the Azure pricing calculator to estimate the cost of hosting your app on Azure.
I hope this helps, let me know if you have further questions.
Regards, Yutong
-Please kindly accept the answer if you feel helpful to support the community, thanks a lot.