Data exfiltration protection architecture

This page is a feature-by-feature reference architecture for network-level data exfiltration protection on Azure. Each section describes one control, like identity, Unity Catalog governance, workspace restrictions, monitoring, and cloud-specific network isolation, and links to its implementation guide. For the concepts and security layer priorities behind these controls, see Data exfiltration protection.

Identity and access controls

Identity-based controls are the first line of defense against data exfiltration. Without strong authentication and trusted access, weak identity undermines network-level controls.

User shield icon. Unified login with SSO

Apply single sign-on (SSO) across all workspaces in the Azure Databricks account using unified login. This ensures users authenticate through your corporate identity provider rather than using personal accounts or non-SSO methods.

Enable multifactor authentication (MFA) within your identity provider for an additional layer of verification.

See Authentication and access control.

User group icon. Automated identity management

Implement SCIM provisioning to automate user lifecycle management. This ensures that former employees are automatically de-provisioned and cannot access workspaces after departure.

See Sync users and groups from Microsoft Entra ID using SCIM.

Globe icon. Network access controls

Restrict workspace and account console access to trusted networks:

Data governance controls

Network controls prevent unauthorized egress paths, but data governance controls ensure that even authorized compute resources can only access approved data destinations. Apply these controls regardless of which network security architecture you deploy.

Key icon. Standard access control

Use Unity Catalog privileges to restrict who can read, write, or modify each catalog, schema, table, and volume. Grant the minimum privileges required for each role and group.

Privileges flow hierarchically: a grant on a catalog applies to all schemas and tables within it. Use this to enforce broad defaults, then narrow access at lower levels for sensitive data.

See Manage privileges in Unity Catalog.

Tag icon. Attribute-based access control (ABAC)

ABAC governs data access based on tags attached to data objects, not just object identity. Use ABAC to enforce policies like "users can only query tables tagged pii=false" or "users in the EU group cannot read tables tagged region=US."

ABAC scales better than per-object GRANTs in large environments where tagging conventions are already in place. It also pairs well with row filters and column masks (below).

See Attribute-based access control in Unity Catalog.

Filter icon. Row filters and column masks

Restrict what users see within a table:

  • Row filters: Apply a SQL function that determines which rows a user can query. For example, restrict a sales table so each regional manager only sees rows for their region.
  • Column masks: Apply a SQL function that transforms a column's value before it returns to the user. For example, mask credit card numbers to XXXX-XXXX-XXXX-1234 for non-finance users.

Row filters and column masks are evaluated at query time, so users can't bypass them with SELECT *.

See Row filters and column masks.

User shield icon. Unity Catalog administrative restrictions

Restrict the creation of data access securables to administrators only:

  • Storage credentials: Only allow admins to create storage credentials. Apply least-privilege cloud access policies (IAM roles, managed identities) for each credential. See Manage storage credentials.
  • External locations: Only allow admins to create external locations that map to cloud storage paths. See Manage external locations.
  • Database connections: Only allow admins to create connections to external databases through Lakehouse Federation. See Manage connections for Lakehouse Federation.
  • Service credentials: Only allow admins to create service credentials for external cloud services. See Create service credentials.

Grant users permissions to use approved securables rather than create new ones. This prevents users from pointing compute at untrusted storage or endpoints.

Catalog gear icon. Workspace bindings for catalogs

Bind Unity Catalog catalogs to specific workspaces to prevent cross-environment data access. For example, prevent development workspaces from reading production data.

See Workspace-catalog binding.

Database icon. Storage account policies

Implement firewalls or bucket policies on storage accounts to accept traffic only from approved source destinations:

  • Configure Azure Storage firewalls to allow access only from approved VNets, private endpoints, or service endpoints.
  • Use managed identities with least-privilege role assignments.

Workspace restrictions

Workspace admin settings control data download and export paths through the Azure Databricks UI. Disable these settings to prevent users from extracting data through the workspace interface.

Setting Risk mitigated
Disable notebook results download Users downloading query results to local machines
Disable volume files download Users downloading volume files to local machines
Disable notebook and file exporting Users exporting notebooks or files from the workspace
Disable SQL results download Users downloading SQL query results
Disable MLflow run artifact download Users downloading MLflow experiment artifacts
Disable results table clipboard Users copying tabular data to the clipboard

Configure these settings in the workspace admin console under security settings. See Manage your workspace.

Monitoring and detection

Preventive controls reduce the risk of data exfiltration, but monitoring detects when controls fail or when attackers bypass them.

Alerts icon. System tables for audit monitoring

Use Azure Databricks Monitor costs using system tables to monitor data access patterns. The Audit log system table reference captures workspace events including:

  • User authentication and access attempts.
  • Data read and write operations.
  • Administrative configuration changes.
  • Credential usage and external location access.

Set up alerts for suspicious activity, such as unusual data volumes, access from unexpected locations, or attempts to access unauthorized resources.

Cloud icon. Cloud-native log integration

Ingest cloud-specific logs to supplement Azure Databricks system tables:

  • Configure Azure Monitor and Activity Log to capture storage access events, managed identity usage, and network flow logs.

Correlate cloud-native logs with Azure Databricks audit logs for complete visibility into data movement across your environment.

Azure architecture

The Azure architecture uses VNet injection, Private Link, and Azure Firewall to create a secure network perimeter around Azure Databricks workloads.

Prerequisites

Component Details
Virtual network Customer-managed VNet for Azure Databricks data plane deployment using Deploy Azure Databricks in your Azure virtual network (VNet injection).
Subnets Three subnets: host (public), container (private), and private endpoint subnet.
Firewall or NVA Network virtual appliance (Azure Firewall or third-party) for egress inspection and policy enforcement.
Private DNS zones DNS resolution for private endpoints within the virtual network.
Azure Key Vault Stores customer-managed keys for DBFS, managed disks, and managed services encryption.
Firewall allow list Required Azure Databricks endpoints. See Configure domain name firewall rules.

Architecture components

The architecture has four main areas: network isolation, private connectivity, egress control, and serverless security.

Shield icon. Network isolation

Deploy Azure Databricks with Enable secure cluster connectivity (SCC) enabled in a virtual network using Deploy Azure Databricks in your Azure virtual network (VNet injection). You can deploy using a hub-and-spoke topology with a centralized firewall, or an isolated (island) network topology without a hub. This configuration:

  • Eliminates public IP addresses on cluster nodes.
  • Requires dedicated subnet pairs per workspace (one private, one public).
  • Routes control plane traffic through private endpoints.
Link icon. Private connectivity

Set up Private Link endpoints for customer-managed Azure storage accounts in a dedicated subnet:

Note

Private endpoints and service endpoint policies apply only to customer-managed Azure storage accounts. Azure Databricks-managed resources (artifact storage, log storage, and Event Hubs) cannot be placed behind private endpoints.

Configure Configure Inbound Private Link for user access and browser authentication (SSO).

Filter icon. Egress control

Deploy Azure Firewall (or a third-party network virtual appliance) in a hub virtual network:

  • Application rules: Define FQDNs accessible through the firewall (control plane, web app, and SCC relay if classic compute plane Private Link is not configured).
  • Network rules: Define IP address, port, and protocol for endpoints that can't use FQDNs.
  • User-defined routes (UDRs): Route non-local traffic from Azure Databricks subnets through the firewall using a default route (0.0.0.0/0).

Note

When using service endpoint policies, no firewall network rules are needed for Azure Databricks service storage accounts (artifact, logging, system tables).

Service endpoints bypass the firewall for Azure Databricks system storage, reducing data transfer costs and avoiding throttling. Artifact storage alone can account for up to 11 GB downloaded per cluster node.

Shield check icon. Serverless security

Configure What is serverless egress control? to govern egress traffic. Use Serverless compute plane networking to establish private connections between serverless compute and Azure storage accounts (ADLS Gen2).

Optimization strategies:

  • Use service endpoints instead of Private Link where security requirements allow.
  • Configure service endpoint policies to bypass firewall for Azure Databricks system storage (reduces data transfer costs and avoids throttling).
  • Right-size Azure Firewall or NVA throughput based on actual requirements.
  • Monitor data transfer costs through firewall appliances.

See Understand Databricks networking costs for detailed guidance.

See also

Resource Description
Network reference architectures Network security architectures (managed, hardened, isolated).
Security and compliance Security and compliance controls beyond networking.