Events
Mar 31, 11 PM - Apr 2, 11 PM
The ultimate Microsoft Fabric, Power BI, SQL, and AI community-led event. March 31 to April 2, 2025.
Register todayThis browser is no longer supported.
Upgrade to Microsoft Edge to take advantage of the latest features, security updates, and technical support.
This article describes Mosaic AI Model Serving, the Databricks solution for deploying AI and ML models for real-time serving and batch inference.
Mosaic AI Model Serving provides a unified interface to deploy, govern, and query AI models for real-time and batch inference. Each model you serve is available as a REST API that you can integrate into your web or client application.
Model Serving provides a highly available and low-latency service for deploying models. The service automatically scales up or down to meet demand changes, saving infrastructure costs while optimizing latency performance. This functionality uses serverless compute. See the Model Serving pricing page for more details.
Model Serving offers a unified REST API and MLflow Deployment API for CRUD and querying tasks. In addition, it provides a single UI to manage all your models and their respective serving endpoints. You can also access models directly from SQL using AI functions for easy integration into analytics workflows.
See the following guides to get started:
Model serving supports real time and batch inference for the following model types:
Note
You can interact with supported large language models using the AI Playground. The AI Playground is a chat-like environment where you can test, prompt, and compare LLMs. This functionality is available in your Azure Databricks workspace.
Note
For workloads that are latency sensitive or involve a high number of queries per second, Databricks recommends using route optimization on custom model serving endpoints. Reach out to your Databricks account team to ensure your workspace is enabled for high scalability.
Note
Model Serving does not provide security patches to existing model images because of the risk of destabilization to production deployments. A new model image created from a new model version will contain the latest patches. Reach out to your Databricks account team for more information.
No additional steps are required to enable Model Serving in your workspace.
Mosaic AI Model Serving imposes default limits to ensure reliable performance. See Model Serving limits and regions. If you have feedback on these limits or an endpoint in an unsupported region, reach out to your Databricks account team.
Databricks takes data security seriously. Databricks understands the importance of the data you analyze using Mosaic AI Model Serving, and implements the following security controls to protect your data.
For all paid accounts, Mosaic AI Model Serving does not use user inputs submitted to the service or outputs from the service to train any models or improve any Databricks services.
For Databricks Foundation Model APIs, as part of providing the service, Databricks may temporarily process and store inputs and outputs for the purposes of preventing, detecting, and mitigating abuse or harmful uses. Your inputs and outputs are isolated from those of other customers, stored in the same region as your workspace for up to thirty (30) days, and only accessible for detecting and responding to security or abuse concerns. Foundation Model APIs is a Databricks Designated Service, meaning it adheres to data residency boundaries as implemented by Databricks Geos.
Events
Mar 31, 11 PM - Apr 2, 11 PM
The ultimate Microsoft Fabric, Power BI, SQL, and AI community-led event. March 31 to April 2, 2025.
Register today