Azure Databricks architecture overview
This article provides a high-level overview of Azure Databricks architecture, including its enterprise architecture, in combination with Azure.
Control plane and compute plane
Azure Databricks is structured to enable secure cross-functional team collaboration while keeping a significant amount of backend services managed by Azure Databricks so you can stay focused on your data science, data analytics, and data engineering tasks.
Azure Databricks operates out of a control plane and a compute plane.
The control plane includes the backend services that Azure Databricks manages in your Azure Databricks account. Notebook commands and many other workspace configurations are stored in the control plane and encrypted at rest.
The compute plane is where your data is processed.
- For most Azure Databricks computation, the compute resources are in your Azure subscription in what is called the classic compute plane. This refers to the network in your Azure subscription and its resources. Azure Databricks uses the classic compute plane for your notebooks, jobs, and for pro and classic Databricks SQL warehouses.
- For serverless SQL warehouses or Model Serving, the serverless compute resources run in a serverless compute plane in your Azure Databricks account. For additional architecture information, see Serverless compute.
Previously, Azure Databricks referred to the compute plane as the data plane.
Use Azure Databricks connectors to connect clusters to external data sources outside of your Azure subscription to ingest data or for storage. You can also ingest data from external streaming data sources, such as events data, streaming data, IoT data, and more.
To configure the networks for your classic compute plane, see Manage virtual networks.
Your data lake is stored at rest in your Azure subscription and in your own data sources so you maintain control and ownership of your data.
Job results reside in storage in your Azure subscription. For interactive notebook results, storage is in a combination of the control plane (partial results for presentation in the UI) and your Azure storage. If you want interactive notebook results stored only in your Azure subscription, you can configure the storage location for interactive notebook results. See Configure the storage location for interactive notebook results. Note that some metadata about results, such as chart column names, continues to be stored in the control plane.
Although architectures can vary depending on custom configurations (such as when you’ve deployed an Azure Databricks workspace to your own virtual network, also known as VNet injection), the following architecture diagram represents the most common structure and flow of data for Azure Databricks.
The following diagram describes the overall architecture. For details about the serverless compute plane that is used for serverless SQL warehouses, see Serverless compute.