The model catalog in Azure AI studio is the hub to discover and use a wide range of models for building generative AI applications. Models need to be deployed to make them available for receiving inference requests. The process of interacting with a deployed model is called inferencing. Azure AI Studio offer a comprehensive suite of deployment options for those models depending on your needs and model requirements.
Deploying models
Deployment options vary depending on the model type:
Azure OpenAI models: The latest OpenAI models that have enterprise features from Azure.
Models as a Service models: These models don't require compute quota from your subscription. This option allows you to deploy your Model as a Service (MaaS). You use a serverless API deployment and are billed per token in a pay-as-you-go fashion.
Open and custom models: The model catalog offers access to a large variety of models across modalities that are of open access. You can host open models in your own subscription with a managed infrastructure, virtual machines, and the number of instances for capacity management. There's a wide range of models from Azure OpenAI, Hugging Face, and NVIDIA.
Azure AI studio offers four different deployment options:
1 A minimal endpoint infrastructure is billed per minute. You aren't billed for the infrastructure that hosts the model in pay-as-you-go. After you delete the endpoint, no further charges accrue.
2 Billing is on a per-minute basis, depending on the product tier and the number of instances used in the deployment since the moment of creation. After you delete the endpoint, no further charges accrue.
Azure AI studio encourages customers to explore the deployment options and pick the one that best suites their business and technical needs. In general you can use the following thinking process:
Start with the deployment options that have the bigger scopes. This allows you to iterate and prototype faster in your application without having to rebuild your architecture each time you decide to change something. Azure AI model inference service is a deployment target that supports all the flagship models in the Azure AI catalog, including latest innovation from Azure OpenAI.
When you are looking to use a specific model:
When you are interested in Azure OpenAI models, use the Azure OpenAI Service which offers a wide range of capabilities for them and it's designed for them.
When you are interested in a particular model from Models as a Service, and you don't expect to use any other type of model, use Serverless API endpoints. They allow deployment of a single model under a unique set of endpoint URL and keys.
When your model is not available in Models as a Service and you have compute quota available in your subscription, use Managed Compute which support deployment of open and custom models. It also allows high level of customization of the deployment inference server, protocols, and detailed configuration.
Tip
Each deployment option may offer different capabilities in terms of networking, security, and additional features like content safety. Review the documentation for each of them to understand their limitations.
Explore the various language models that are available through the Azure AI Studio's model catalog. Understand how to select, deploy, and test a model, and to improve its performance.