Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
This page is a feature-by-feature reference architecture for network-level data exfiltration protection on Azure. Each section describes one control, like identity, Unity Catalog governance, workspace restrictions, monitoring, and cloud-specific network isolation, and links to its implementation guide. For the concepts and security layer priorities behind these controls, see Data exfiltration protection.
- To deploy the full set of controls as a single bundle, use the Azure Databricks Security Reference Architecture Terraform module, which implements the Isolated environment architecture end-to-end. See the Azure Security Reference Architecture Terraform module.
- To configure controls individually, use the guide below.
Identity and access controls
Identity-based controls are the first line of defense against data exfiltration. Without strong authentication and trusted access, weak identity undermines network-level controls.
Unified login with SSO
Apply single sign-on (SSO) across all workspaces in the Azure Databricks account using unified login. This ensures users authenticate through your corporate identity provider rather than using personal accounts or non-SSO methods.
Enable multifactor authentication (MFA) within your identity provider for an additional layer of verification.
Automated identity management
Implement SCIM provisioning to automate user lifecycle management. This ensures that former employees are automatically de-provisioned and cannot access workspaces after departure.
See Sync users and groups from Microsoft Entra ID using SCIM.
Network access controls
Restrict workspace and account console access to trusted networks:
- Account-level IP access lists: Control access to the account console. See Configure IP access lists for the account console.
- Workspace-level IP access lists: Control access to individual workspaces. See Configure IP access lists for workspaces.
- Private connectivity: Use inbound Private Link to eliminate public workspace access entirely. See Configure Inbound Private Link.
Data governance controls
Network controls prevent unauthorized egress paths, but data governance controls ensure that even authorized compute resources can only access approved data destinations. Apply these controls regardless of which network security architecture you deploy.
Standard access control
Use Unity Catalog privileges to restrict who can read, write, or modify each catalog, schema, table, and volume. Grant the minimum privileges required for each role and group.
Privileges flow hierarchically: a grant on a catalog applies to all schemas and tables within it. Use this to enforce broad defaults, then narrow access at lower levels for sensitive data.
Attribute-based access control (ABAC)
ABAC governs data access based on tags attached to data objects, not just object identity. Use ABAC to enforce policies like "users can only query tables tagged pii=false" or "users in the EU group cannot read tables tagged region=US."
ABAC scales better than per-object GRANTs in large environments where tagging conventions are already in place. It also pairs well with row filters and column masks (below).
Row filters and column masks
Restrict what users see within a table:
- Row filters: Apply a SQL function that determines which rows a user can query. For example, restrict a sales table so each regional manager only sees rows for their region.
- Column masks: Apply a SQL function that transforms a column's value before it returns to the user. For example, mask credit card numbers to
XXXX-XXXX-XXXX-1234for non-finance users.
Row filters and column masks are evaluated at query time, so users can't bypass them with SELECT *.
Unity Catalog administrative restrictions
Restrict the creation of data access securables to administrators only:
- Storage credentials: Only allow admins to create storage credentials. Apply least-privilege cloud access policies (IAM roles, managed identities) for each credential. See Manage storage credentials.
- External locations: Only allow admins to create external locations that map to cloud storage paths. See Manage external locations.
- Database connections: Only allow admins to create connections to external databases through Lakehouse Federation. See Manage connections for Lakehouse Federation.
- Service credentials: Only allow admins to create service credentials for external cloud services. See Create service credentials.
Grant users permissions to use approved securables rather than create new ones. This prevents users from pointing compute at untrusted storage or endpoints.
Workspace bindings for catalogs
Bind Unity Catalog catalogs to specific workspaces to prevent cross-environment data access. For example, prevent development workspaces from reading production data.
Storage account policies
Implement firewalls or bucket policies on storage accounts to accept traffic only from approved source destinations:
- Configure Azure Storage firewalls to allow access only from approved VNets, private endpoints, or service endpoints.
- Use managed identities with least-privilege role assignments.
Workspace restrictions
Workspace admin settings control data download and export paths through the Azure Databricks UI. Disable these settings to prevent users from extracting data through the workspace interface.
| Setting | Risk mitigated |
|---|---|
| Disable notebook results download | Users downloading query results to local machines |
| Disable volume files download | Users downloading volume files to local machines |
| Disable notebook and file exporting | Users exporting notebooks or files from the workspace |
| Disable SQL results download | Users downloading SQL query results |
| Disable MLflow run artifact download | Users downloading MLflow experiment artifacts |
| Disable results table clipboard | Users copying tabular data to the clipboard |
Configure these settings in the workspace admin console under security settings. See Manage your workspace.
Monitoring and detection
Preventive controls reduce the risk of data exfiltration, but monitoring detects when controls fail or when attackers bypass them.
System tables for audit monitoring
Use Azure Databricks Monitor costs using system tables to monitor data access patterns. The Audit log system table reference captures workspace events including:
- User authentication and access attempts.
- Data read and write operations.
- Administrative configuration changes.
- Credential usage and external location access.
Set up alerts for suspicious activity, such as unusual data volumes, access from unexpected locations, or attempts to access unauthorized resources.
Cloud-native log integration
Ingest cloud-specific logs to supplement Azure Databricks system tables:
- Configure Azure Monitor and Activity Log to capture storage access events, managed identity usage, and network flow logs.
Correlate cloud-native logs with Azure Databricks audit logs for complete visibility into data movement across your environment.
Azure architecture
The Azure architecture uses VNet injection, Private Link, and Azure Firewall to create a secure network perimeter around Azure Databricks workloads.
Prerequisites
| Component | Details |
|---|---|
| Virtual network | Customer-managed VNet for Azure Databricks data plane deployment using Deploy Azure Databricks in your Azure virtual network (VNet injection). |
| Subnets | Three subnets: host (public), container (private), and private endpoint subnet. |
| Firewall or NVA | Network virtual appliance (Azure Firewall or third-party) for egress inspection and policy enforcement. |
| Private DNS zones | DNS resolution for private endpoints within the virtual network. |
| Azure Key Vault | Stores customer-managed keys for DBFS, managed disks, and managed services encryption. |
| Firewall allow list | Required Azure Databricks endpoints. See Configure domain name firewall rules. |
Architecture components
The architecture has four main areas: network isolation, private connectivity, egress control, and serverless security.
Network isolation
Deploy Azure Databricks with Enable secure cluster connectivity (SCC) enabled in a virtual network using Deploy Azure Databricks in your Azure virtual network (VNet injection). You can deploy using a hub-and-spoke topology with a centralized firewall, or an isolated (island) network topology without a hub. This configuration:
- Eliminates public IP addresses on cluster nodes.
- Requires dedicated subnet pairs per workspace (one private, one public).
- Routes control plane traffic through private endpoints.
Tip
Don't store application data in DBFS root storage. Disable access to DBFS root and mounts in your existing Azure Databricks workspace and use What are Unity Catalog volumes? instead.
Private connectivity
Set up Private Link endpoints for customer-managed Azure storage accounts in a dedicated subnet:
- All data access occurs over the Azure network backbone.
- Private endpoints can be deployed in the Azure Databricks VNet or a peered VNet.
- As an alternative for customer-managed storage accounts, use Configure Azure virtual network service endpoint policies for storage access from classic compute (no additional cost).
Note
Private endpoints and service endpoint policies apply only to customer-managed Azure storage accounts. Azure Databricks-managed resources (artifact storage, log storage, and Event Hubs) cannot be placed behind private endpoints.
Configure Configure Inbound Private Link for user access and browser authentication (SSO).
Egress control
Deploy Azure Firewall (or a third-party network virtual appliance) in a hub virtual network:
- Application rules: Define FQDNs accessible through the firewall (control plane, web app, and SCC relay if classic compute plane Private Link is not configured).
- Network rules: Define IP address, port, and protocol for endpoints that can't use FQDNs.
- User-defined routes (UDRs): Route non-local traffic from Azure Databricks subnets through the firewall using a default route (
0.0.0.0/0).
Note
When using service endpoint policies, no firewall network rules are needed for Azure Databricks service storage accounts (artifact, logging, system tables).
Service endpoints bypass the firewall for Azure Databricks system storage, reducing data transfer costs and avoiding throttling. Artifact storage alone can account for up to 11 GB downloaded per cluster node.
Serverless security
Configure What is serverless egress control? to govern egress traffic. Use Serverless compute plane networking to establish private connections between serverless compute and Azure storage accounts (ADLS Gen2).
Optimization strategies:
- Use service endpoints instead of Private Link where security requirements allow.
- Configure service endpoint policies to bypass firewall for Azure Databricks system storage (reduces data transfer costs and avoids throttling).
- Right-size Azure Firewall or NVA throughput based on actual requirements.
- Monitor data transfer costs through firewall appliances.
See Understand Databricks networking costs for detailed guidance.
See also
| Resource | Description |
|---|---|
| Network reference architectures | Network security architectures (managed, hardened, isolated). |
| Security and compliance | Security and compliance controls beyond networking. |