Azure Databricks operates with two key planes: Control Plane and Compute Plane, each serving distinct roles. As requested, here's a breakdown of the high-level and modern enterprise architecture of Azure Databricks:
1. Control Plane
- Purpose: Handles backend services and management operations.
- Scope: Azure Databricks manages this plane within its own cloud infrastructure.
- Components:
- Web application interface: Used to manage and interact with Databricks resources.
- Metadata storage: Manages workspace configurations, job schedules, and cluster metadata.
- Cluster management services: Responsible for starting, stopping, and scaling compute clusters.
2. Compute Plane
This is where data processing happens. There are two types of compute planes based on how compute resources are deployed:
a. Classic Compute Plane
- Location: Runs inside the customer’s Azure subscription and virtual network (VNet).
- Isolation: Each customer’s environment is naturally isolated because all resources are confined to their subscription and VNet.
- Use Case: Provides full control over network and security configurations. Ideal for organizations needing strict compliance or custom networking.
- Networking: Requires configuration of private endpoints, firewalls, and virtual networks.
- Examples:
- Standard Databricks clusters running in customer-managed networks.
- Integrating with other Azure services like Data Lake and Synapse Analytics within the same VNet.
b. Serverless Compute Plane
- Location: Resources run in a shared, managed compute layer within Databricks’ environment.
- Security: Provides isolation at the cluster and workspace levels, ensuring that customer data remains secure.
- Use Case: Recommended for workloads that benefit from faster setup, lower operational overhead, and elasticity.
- Networking: Simplifies network configuration by offloading management to Azure Databricks.
- Examples:
- Ad-hoc analytics with minimal configuration requirements.
- Lightweight experimentation or proof-of-concept workloads.
Workspace Storage Account
- Location: Created in the customer’s Azure subscription during workspace setup.
- Purpose: Stores system data and files used within Databricks.
- Content:
- Workspace system data: Logs, command results, job run history, and notebook versions.
- DBFS (Databricks File System): Deprecated file system used in earlier versions.
- Unity Catalog workspace catalog: Metadata catalog for data governance and access control.
Differences Between Classic and Serverless Compute Planes
Feature | Classic Compute Plane | Serverless Compute Plane |
---|---|---|
Location | Customer’s Azure subscription | Databricks-managed shared layer |
Control | Full network and security control | Minimal management overhead |
Network Config | Requires private endpoints and firewalls | Simplified networking |
Use Case | Long-term, complex workloads | Fast, elastic workloads |
Isolation | By customer VNet | By workspace and cluster boundaries |
Use Cases for Classic Compute Plane
The Classic Compute Plane is ideal when:
- You need strict network control, such as using private endpoints, NSGs, or custom VNets.
- You must comply with security regulations requiring data to stay in a customer-controlled environment.
- There are dependencies on other Azure services running in the same VNet, like Azure Data Lake or Synapse Analytics.
This modern architecture allows organizations to choose between serverless (convenience and elasticity) and classic compute planes (control and compliance) based on their specific workload requirements.
If the above response helps answer your question, remember to "Accept Answer" so that others in the community facing similar issues can easily find the solution. Your contribution is highly appreciated.
hth
Marcin