Azure security baseline for Azure Databricks

This security baseline applies guidance from the Microsoft cloud security benchmark version 1.0 to Azure Databricks. The Microsoft cloud security benchmark provides recommendations on how you can secure your cloud solutions on Azure. The content is grouped by the security controls defined by the Microsoft cloud security benchmark and the related guidance applicable to Azure Databricks.

You can monitor this security baseline and its recommendations using Microsoft Defender for Cloud. Azure Policy definitions will be listed in the Regulatory Compliance section of the Microsoft Defender for Cloud portal page.

When a feature has relevant Azure Policy Definitions, they are listed in this baseline to help you measure compliance with the Microsoft cloud security benchmark controls and recommendations. Some recommendations may require a paid Microsoft Defender plan to enable certain security scenarios.

Note

Features not applicable to Azure Databricks have been excluded. To see how Azure Databricks completely maps to the Microsoft cloud security benchmark, see the full Azure Databricks security baseline mapping file.

Security profile

The security profile summarizes high-impact behaviors of Azure Databricks, which may result in increased security considerations.

Service Behavior Attribute Value
Product Category Analytics, Storage
Customer can access HOST / OS No Access
Service can be deployed into customer's virtual network True
Stores customer content at rest True

Network security

For more information, see the Microsoft cloud security benchmark: Network security.

NS-1: Establish network segmentation boundaries

Features

Virtual Network Integration

Description: Service supports deployment into customer's private Virtual Network (VNet). Learn more.

Supported Enabled By Default Configuration Responsibility
True False Customer

Configuration Guidance: The default deployment of Azure Databricks is a fully managed service on Azure: all data plane resources, including a VNet that all clusters will be associated with, are deployed to a locked resource group. If you require network customization, however, you can deploy Azure Databricks data plane resources in your own virtual network (VNet injection), enabling you to implement custom network configurations. You can apply your own network security group (NSG) with custom rules to specific egress traffic restrictions.

Reference: Databricks VNET Integration

Network Security Group Support

Description: Service network traffic respects Network Security Groups rule assignment on its subnets. Learn more.

Supported Enabled By Default Configuration Responsibility
True False Customer

Configuration Guidance: Use network security groups (NSG) to restrict or monitor traffic by port, protocol, source IP address, or destination IP address. Create NSG rules to restrict your service's open ports (such as preventing management ports from being accessed from untrusted networks). Be aware that by default, NSGs deny all inbound traffic but allow traffic from virtual network and Azure Load Balancers.

Reference: Network Security Group

NS-2: Secure cloud services with network controls

Features

Description: Service native IP filtering capability for filtering network traffic (not to be confused with NSG or Azure Firewall). Learn more.

Supported Enabled By Default Configuration Responsibility
False Not Applicable Not Applicable

Configuration Guidance: This feature is not supported to secure this service.

Disable Public Network Access

Description: Service supports disabling public network access either through using service-level IP ACL filtering rule (not NSG or Azure Firewall) or using a 'Disable Public Network Access' toggle switch. Learn more.

Supported Enabled By Default Configuration Responsibility
True False Customer

Configuration Guidance: Azure Databricks customers can use the IP access lists feature to define a set of approved IP addresses to prevent access from public IP or unapproved IP addresses.

Reference: IP Access list in Databricks

Identity management

For more information, see the Microsoft cloud security benchmark: Identity management.

IM-1: Use centralized identity and authentication system

Features

Azure AD Authentication Required for Data Plane Access

Description: Service supports using Azure AD authentication for data plane access. Learn more.

Supported Enabled By Default Configuration Responsibility
True True Microsoft

Configuration Guidance: No additional configurations are required as this is enabled on a default deployment.

IM-3: Manage application identities securely and automatically

Features

Managed Identities

Description: Data plane actions support authentication using managed identities. Learn more.

Supported Enabled By Default Configuration Responsibility
False Not Applicable Not Applicable

Feature notes: Azure Databricks is automatically set up to use Azure Active Directory (Azure AD) single sign-on to authenticate users. Users outside of your organization must complete the invitation process and be added to your Active Directory tenant before they are able to log in to Azure Databricks via single sign-on. You can implement SCIM to automate provisioning and de-provisioning users from workspaces.

Understand single sign-on for Azure Databricks

How to use the SCIM APIs for Azure Databricks

Configuration Guidance: This feature is not supported to secure this service.

Service Principals

Description: Data plane supports authentication using service principals. Learn more.

Supported Enabled By Default Configuration Responsibility
True False Customer

Configuration Guidance: For services that don't support managed identities, use Azure Active Directory (Azure AD) to create a service principal with restricted permissions at the resource level. Configure service principals with certificate credentials and fall back to client secrets for authentication.

Reference: Service principal in Databricks

IM-7: Restrict resource access based on conditions

Features

Conditional Access for Data Plane

Description: Data plane access can be controlled using Azure AD Conditional Access Policies. Learn more.

Supported Enabled By Default Configuration Responsibility
True True Microsoft

Feature notes: Additionally Azure Databricks supports IP access lists to make accessing the web application and the REST API more secure.

IP access lists in Databricks

Configuration Guidance: No additional configurations are required as this is enabled on a default deployment.

Reference: Conditional Access in Databricks

IM-8: Restrict the exposure of credential and secrets

Features

Service Credential and Secrets Support Integration and Storage in Azure Key Vault

Description: Data plane supports native use of Azure Key Vault for credential and secrets store. Learn more.

Supported Enabled By Default Configuration Responsibility
True False Customer

Feature notes: Azure Databricks also supports a secret scope stored in (backed by) an encrypted database owned and managed by Azure Databricks.

Databricks-backed scopes

Configuration Guidance: Ensure that secrets and credentials are stored in secure locations such as Azure Key Vault, instead of embedding them into code or configuration files.

Reference: Key Vault Integration in Databricks

Privileged access

For more information, see the Microsoft cloud security benchmark: Privileged access.

PA-7: Follow just enough administration (least privilege) principle

Features

Azure RBAC for Data Plane

Description: Azure Role-Based Access Control (Azure RBAC) can be used to managed access to service's data plane actions. Learn more.

Supported Enabled By Default Configuration Responsibility
True True Microsoft

Feature notes: You can use Azure Databricks SCIM APIs to manage users in an Azure Databricks workspace and grant administrative privileges to designated users.

How to use the SCIM APIs

In Azure Databricks, you can use access control lists (ACLs) to configure permission to access different workspace objects.

Access control in Databricks

Configuration Guidance: No additional configurations are required as this is enabled on a default deployment.

Reference: How to manage access control in Azure Databricks

PA-8: Determine access process for cloud provider support

Features

Customer Lockbox

Description: Customer Lockbox can be used for Microsoft support access. Learn more.

Supported Enabled By Default Configuration Responsibility
True False Customer

Configuration Guidance: In support scenarios where Microsoft needs to access your data, use Customer Lockbox to review, then approve or reject each of Microsoft's data access requests.

Reference: Customer Lockbox

Data protection

For more information, see the Microsoft cloud security benchmark: Data protection.

DP-3: Encrypt sensitive data in transit

Features

Data in Transit Encryption

Description: Service supports data in-transit encryption for data plane. Learn more.

Supported Enabled By Default Configuration Responsibility
True False Customer

Feature notes: By default, the data exchanged between worker nodes in a cluster is not encrypted. If your environment requires that data be encrypted at all times, you can create an init script that configures your clusters to encrypt traffic between worker nodes.

Configuration Guidance: Enable secure transfer in services where there is a native data in transit encryption feature built in. Enforce HTTPS on any web applications and services and ensure TLS v1.2 or later is used. Legacy versions such as SSL 3.0, TLS v1.0 should be disabled. For remote management of Virtual Machines, use SSH (for Linux) or RDP/TLS (for Windows) instead of an unencrypted protocol.

Reference: Data in transit Encryption for Databricks

DP-4: Enable data at rest encryption by default

Features

Data at Rest Encryption Using Platform Keys

Description: Data at-rest encryption using platform keys is supported, any customer content at rest is encrypted with these Microsoft managed keys. Learn more.

Supported Enabled By Default Configuration Responsibility
True True Microsoft

Configuration Guidance: No additional configurations are required as this is enabled on a default deployment.

Reference: Data at rest encryption using platform managed keys in Databricks

DP-5: Use customer-managed key option in data at rest encryption when required

Features

Data at Rest Encryption Using CMK

Description: Data at-rest encryption using customer-managed keys is supported for customer content stored by the service. Learn more.

Supported Enabled By Default Configuration Responsibility
True False Customer

Feature notes: Azure Databricks has two customer-managed key features for different types of data.

Customer-managed keys for encryption

Configuration Guidance: If required for regulatory compliance, define the use case and service scope where encryption using customer-managed keys are needed. Enable and implement data at rest encryption using customer-managed key for those services.

Reference: Data at Rest Encryption Using CMK in Databricks

DP-6: Use a secure key management process

Features

Key Management in Azure Key Vault

Description: The service supports Azure Key Vault integration for any customer keys, secrets, or certificates. Learn more.

Supported Enabled By Default Configuration Responsibility
True False Customer

Feature notes: Note, you cannot use an Azure Databricks personal access token or an Azure AD application token that belongs to a service principal.

Avoid personal access token

Configuration Guidance: Use Azure Key Vault to create and control the life cycle of your encryption keys, including key generation, distribution, and storage. Rotate and revoke your keys in Azure Key Vault and your service based on a defined schedule or when there is a key retirement or compromise. When there is a need to use customer-managed key (CMK) in the workload, service, or application level, ensure you follow the best practices for key management: Use a key hierarchy to generate a separate data encryption key (DEK) with your key encryption key (KEK) in your key vault. Ensure keys are registered with Azure Key Vault and referenced via key IDs from the service or application. If you need to bring your own key (BYOK) to the service (such as importing HSM-protected keys from your on-premises HSMs into Azure Key Vault), follow recommended guidelines to perform initial key generation and key transfer.

Reference: Key management in Databricks

Asset management

For more information, see the Microsoft cloud security benchmark: Asset management.

AM-2: Use only approved services

Features

Azure Policy Support

Description: Service configurations can be monitored and enforced via Azure Policy. Learn more.

Supported Enabled By Default Configuration Responsibility
True False Customer

Configuration Guidance: Use Microsoft Defender for Cloud to configure Azure Policy to audit and enforce configurations of your Azure resources. Use Azure Monitor to create alerts when there is a configuration deviation detected on the resources. Use Azure Policy [deny] and [deploy if not exists] effects to enforce secure configuration across Azure resources.

Reference: Databricks Azure Policy

Logging and threat detection

For more information, see the Microsoft cloud security benchmark: Logging and threat detection.

LT-1: Enable threat detection capabilities

Features

Microsoft Defender for Service / Product Offering

Description: Service has an offering-specific Microsoft Defender solution to monitor and alert on security issues. Learn more.

Supported Enabled By Default Configuration Responsibility
False Not Applicable Not Applicable

Configuration Guidance: This feature is not supported to secure this service.

LT-4: Enable logging for security investigation

Features

Azure Resource Logs

Description: Service produces resource logs that can provide enhanced service-specific metrics and logging. The customer can configure these resource logs and send them to their own data sink like a storage account or log analytics workspace. Learn more.

Supported Enabled By Default Configuration Responsibility
True False Customer

Configuration Guidance: For audit logging, Azure Databricks provides comprehensive end-to-end diagnostic logs of activities performed by Azure Databricks users, allowing your enterprise to monitor detailed Azure Databricks usage patterns.

Note: that Azure Databricks diagnostic logs require the Azure Databricks Premium Plan.

How to enable Diagnostic Settings for Azure Activity Log

How to enable Diagnostic Settings for Azure Databricks

Reference: Resource logs in Databricks

Posture and vulnerability management

For more information, see the Microsoft cloud security benchmark: Posture and vulnerability management.

PV-3: Define and establish secure configurations for compute resources

Features

Other guidance for PV-3

When you create an Azure Databricks cluster, it spins up base VM images. User code is run within containers that are deployed on the VMs. Implement a third-party vulnerability management solution. If you have a vulnerability management platform subscription, you may use Azure Databricks initialization scripts, running in the containers on each of the nodes, to install vulnerability assessment agents on your Azure Databricks cluster nodes, and manage the nodes through the respective portal. Note that every third-party solution works differently.

Databricks cluster node initialization scripts

Backup and recovery

For more information, see the Microsoft cloud security benchmark: Backup and recovery.

BR-1: Ensure regular automated backups

Features

Azure Backup

Description: The service can be backed up by the Azure Backup service. Learn more.

Supported Enabled By Default Configuration Responsibility
False Not Applicable Not Applicable

Configuration Guidance: This feature is not supported to secure this service.

Service Native Backup Capability

Description: Service supports its own native backup capability (if not using Azure Backup). Learn more.

Supported Enabled By Default Configuration Responsibility
True False Customer

Feature notes: For your Azure Databricks data sources, ensure you have configured an appropriate level of data redundancy for your use case. For example, if using an Azure Storage account for your Azure Databricks data store, choose the appropriate redundancy option (LRS, ZRS, GRS, RA-GRS).

Data sources for Azure Databricks

Configuration Guidance: There is no current Microsoft guidance for this feature configuration. Please review and determine if your organization wants to configure this security feature.

Reference: Regional disaster recovery for Azure Databricks clusters

Next steps