databricks architecture

Vineet S 1,390 Reputation points
2024-10-26T06:56:44.45+00:00

how to define datbricks architecture RDD or advance architecture format like below

https://learn.microsoft.com/en-us/azure/databricks/getting-started/overview

and morden architecture says,what is classic compute plane

Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
2,526 questions
0 comments No comments
{count} votes

Accepted answer
  1. Marcin Policht 50,495 Reputation points MVP Volunteer Moderator
    2024-10-26T11:34:02.1833333+00:00

    Azure Databricks operates with two key planes: Control Plane and Compute Plane, each serving distinct roles. As requested, here's a breakdown of the high-level and modern enterprise architecture of Azure Databricks:


    1. Control Plane

    • Purpose: Handles backend services and management operations.
    • Scope: Azure Databricks manages this plane within its own cloud infrastructure.
    • Components:
      • Web application interface: Used to manage and interact with Databricks resources.
      • Metadata storage: Manages workspace configurations, job schedules, and cluster metadata.
      • Cluster management services: Responsible for starting, stopping, and scaling compute clusters.

    2. Compute Plane

    This is where data processing happens. There are two types of compute planes based on how compute resources are deployed:

    a. Classic Compute Plane

    • Location: Runs inside the customer’s Azure subscription and virtual network (VNet).
    • Isolation: Each customer’s environment is naturally isolated because all resources are confined to their subscription and VNet.
    • Use Case: Provides full control over network and security configurations. Ideal for organizations needing strict compliance or custom networking.
    • Networking: Requires configuration of private endpoints, firewalls, and virtual networks.
    • Examples:
      • Standard Databricks clusters running in customer-managed networks.
      • Integrating with other Azure services like Data Lake and Synapse Analytics within the same VNet.

    b. Serverless Compute Plane

    • Location: Resources run in a shared, managed compute layer within Databricks’ environment.
    • Security: Provides isolation at the cluster and workspace levels, ensuring that customer data remains secure.
    • Use Case: Recommended for workloads that benefit from faster setup, lower operational overhead, and elasticity.
    • Networking: Simplifies network configuration by offloading management to Azure Databricks.
    • Examples:
      • Ad-hoc analytics with minimal configuration requirements.
      • Lightweight experimentation or proof-of-concept workloads.

    Workspace Storage Account

    • Location: Created in the customer’s Azure subscription during workspace setup.
    • Purpose: Stores system data and files used within Databricks.
    • Content:
      1. Workspace system data: Logs, command results, job run history, and notebook versions.
      2. DBFS (Databricks File System): Deprecated file system used in earlier versions.
      3. Unity Catalog workspace catalog: Metadata catalog for data governance and access control.

    Differences Between Classic and Serverless Compute Planes

    Feature Classic Compute Plane Serverless Compute Plane
    Location Customer’s Azure subscription Databricks-managed shared layer
    Control Full network and security control Minimal management overhead
    Network Config Requires private endpoints and firewalls Simplified networking
    Use Case Long-term, complex workloads Fast, elastic workloads
    Isolation By customer VNet By workspace and cluster boundaries

    Use Cases for Classic Compute Plane

    The Classic Compute Plane is ideal when:

    • You need strict network control, such as using private endpoints, NSGs, or custom VNets.
    • You must comply with security regulations requiring data to stay in a customer-controlled environment.
    • There are dependencies on other Azure services running in the same VNet, like Azure Data Lake or Synapse Analytics.

    This modern architecture allows organizations to choose between serverless (convenience and elasticity) and classic compute planes (control and compliance) based on their specific workload requirements.


    If the above response helps answer your question, remember to "Accept Answer" so that others in the community facing similar issues can easily find the solution. Your contribution is highly appreciated.

    hth

    Marcin

    0 comments No comments

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.