Thank you for reaching out to Microsoft Q&A forum!
Is serverless deployment currently supported for fine-tuned models in Azure ML?
Currently, serverless deployment is not supported for custom fine-tuned models in Azure ML. As of now, serverless API deployment is available only for specific models in the model catalog, such as the Meta Llama models and Phi-3.5 models, which do not require a subscription to the model offering.
If not, what type of compute system is recommended for deploying real-time endpoints?
For fine-tuned models like Phi 3.5 MoE, you would need to use managed compute for deploying a real-time API endpoint. Managed compute is ideal for handling real-time predictions, as it provides flexibility for running custom models with various compute configurations.
I’ve tried various CPU configurations, but deployments have failed. Is there a list of compatible compute options for endpoints that I can refer to?
If your workload involves working with large datasets in memory (for example, in-memory analytics, databases, or applications that require large memory configurations), it's essential to choose VMs that provide a higher memory-to-CPU ratio. This will prevent performance bottlenecks that can occur if the application runs out of available memory during execution.
For memory-heavy applications, Azure offers VMs that are optimized for large memory workloads, ensuring smooth performance even for the most demanding tasks, such as high-performance databases.
For more info, please refer to:
Deploy models as serverless API endpoints.
Models in serverless API endpoints.
I hope you understand. Do let us know if you any further queries.
If this answers your query, do click Accept Answer
and Yes
for was this answer helpful.