Azure Databricks architecture overview
This article provides a high-level overview of Azure Databricks architecture, including its enterprise architecture, in combination with Azure.
Azure Databricks is structured to enable secure cross-functional team collaboration while keeping a significant amount of backend services managed by Azure Databricks so you can stay focused on your data science, data analytics, and data engineering tasks.
Azure Databricks operates out of a control plane and a data plane.
Although architectures can vary depending on custom configurations (such as when you’ve deployed an Azure Databricks workspace to your own virtual network, also known as VNet injection), the following architecture diagram represents the most common structure and flow of data for Azure Databricks.
Control plane and data plane
- The control plane includes the backend services that Azure Databricks manages in its own Azure account. Notebook commands and many other workspace configurations are stored in the control plane and encrypted at rest.
- Your Azure account manages the data plane, and is where your data resides. This is also where data is processed. Use Azure Databricks connectors to connect clusters to external data sources outside of your Azure account to ingest data, or for storage. You can also ingest data from external streaming data sources, such as events data, streaming data, IoT data, and more.
For more architecture information, see Manage virtual networks.
Your data is stored at rest in your Azure account in the data plane and in your own data sources, not the control plane, so you maintain control and ownership of your data.
Job results reside in storage in your account.
Interactive notebook results are stored in a combination of the control plane (partial results for presentation in the UI) and your Azure storage. If you want interactive notebook results stored only in your cloud account storage, you can ask your Databricks representative to enable interactive notebook results in the customer account for your workspace. Note that some metadata about results, such as chart column names, continues to be stored in the control plane. This feature is in Public Preview.