Azure security baseline for Azure Databricks
This security baseline applies guidance from the Microsoft cloud security benchmark version 1.0 to Azure Databricks. The Microsoft cloud security benchmark provides recommendations on how you can secure your cloud solutions on Azure. The content is grouped by the security controls defined by the Microsoft cloud security benchmark and the related guidance applicable to Azure Databricks.
You can monitor this security baseline and its recommendations using Microsoft Defender for Cloud. Azure Policy definitions will be listed in the Regulatory Compliance section of the Microsoft Defender for Cloud portal page.
When a feature has relevant Azure Policy Definitions, they are listed in this baseline to help you measure compliance with the Microsoft cloud security benchmark controls and recommendations. Some recommendations may require a paid Microsoft Defender plan to enable certain security scenarios.
Note
Features not applicable to Azure Databricks have been excluded. To see how Azure Databricks completely maps to the Microsoft cloud security benchmark, see the full Azure Databricks security baseline mapping file.
Security profile
The security profile summarizes high-impact behaviors of Azure Databricks, which may result in increased security considerations.
Service Behavior Attribute | Value |
---|---|
Product Category | Analytics, Storage |
Customer can access HOST / OS | No Access |
Service can be deployed into customer's virtual network | True |
Stores customer content at rest | True |
Network security
For more information, see the Microsoft cloud security benchmark: Network security.
NS-1: Establish network segmentation boundaries
Features
Virtual Network Integration
Description: Service supports deployment into customer's private Virtual Network (VNet). Learn more.
Supported | Enabled By Default | Configuration Responsibility |
---|---|---|
True | False | Customer |
Configuration Guidance: The default deployment of Azure Databricks is a fully managed service on Azure: all data plane resources, including a VNet that all clusters will be associated with, are deployed to a locked resource group. If you require network customization, however, you can deploy Azure Databricks data plane resources in your own virtual network (VNet injection), enabling you to implement custom network configurations. You can apply your own network security group (NSG) with custom rules to specific egress traffic restrictions.
Reference: Databricks VNET Integration
Network Security Group Support
Description: Service network traffic respects Network Security Groups rule assignment on its subnets. Learn more.
Supported | Enabled By Default | Configuration Responsibility |
---|---|---|
True | False | Customer |
Configuration Guidance: Use network security groups (NSG) to restrict or monitor traffic by port, protocol, source IP address, or destination IP address. Create NSG rules to restrict your service's open ports (such as preventing management ports from being accessed from untrusted networks). Be aware that by default, NSGs deny all inbound traffic but allow traffic from virtual network and Azure Load Balancers.
Reference: Network Security Group
NS-2: Secure cloud services with network controls
Features
Azure Private Link
Description: Service native IP filtering capability for filtering network traffic (not to be confused with NSG or Azure Firewall). Learn more.
Supported | Enabled By Default | Configuration Responsibility |
---|---|---|
False | Not Applicable | Not Applicable |
Configuration Guidance: This feature is not supported to secure this service.
Disable Public Network Access
Description: Service supports disabling public network access either through using service-level IP ACL filtering rule (not NSG or Azure Firewall) or using a 'Disable Public Network Access' toggle switch. Learn more.
Supported | Enabled By Default | Configuration Responsibility |
---|---|---|
True | False | Customer |
Configuration Guidance: Azure Databricks customers can use the IP access lists feature to define a set of approved IP addresses to prevent access from public IP or unapproved IP addresses.
Reference: IP Access list in Databricks
Identity management
For more information, see the Microsoft cloud security benchmark: Identity management.
IM-1: Use centralized identity and authentication system
Features
Azure AD Authentication Required for Data Plane Access
Description: Service supports using Azure AD authentication for data plane access. Learn more.
Supported | Enabled By Default | Configuration Responsibility |
---|---|---|
True | True | Microsoft |
Configuration Guidance: No additional configurations are required as this is enabled on a default deployment.
IM-3: Manage application identities securely and automatically
Features
Managed Identities
Description: Data plane actions support authentication using managed identities. Learn more.
Supported | Enabled By Default | Configuration Responsibility |
---|---|---|
False | Not Applicable | Not Applicable |
Feature notes: Azure Databricks is automatically set up to use Azure Active Directory (Azure AD) single sign-on to authenticate users. Users outside of your organization must complete the invitation process and be added to your Active Directory tenant before they are able to log in to Azure Databricks via single sign-on. You can implement SCIM to automate provisioning and de-provisioning users from workspaces.
Understand single sign-on for Azure Databricks
How to use the SCIM APIs for Azure Databricks
Configuration Guidance: This feature is not supported to secure this service.
Service Principals
Description: Data plane supports authentication using service principals. Learn more.
Supported | Enabled By Default | Configuration Responsibility |
---|---|---|
True | False | Customer |
Configuration Guidance: For services that don't support managed identities, use Azure Active Directory (Azure AD) to create a service principal with restricted permissions at the resource level. Configure service principals with certificate credentials and fall back to client secrets for authentication.
Reference: Service principal in Databricks
IM-7: Restrict resource access based on conditions
Features
Conditional Access for Data Plane
Description: Data plane access can be controlled using Azure AD Conditional Access Policies. Learn more.
Supported | Enabled By Default | Configuration Responsibility |
---|---|---|
True | True | Microsoft |
Feature notes: Additionally Azure Databricks supports IP access lists to make accessing the web application and the REST API more secure.
Configuration Guidance: No additional configurations are required as this is enabled on a default deployment.
Reference: Conditional Access in Databricks
IM-8: Restrict the exposure of credential and secrets
Features
Service Credential and Secrets Support Integration and Storage in Azure Key Vault
Description: Data plane supports native use of Azure Key Vault for credential and secrets store. Learn more.
Supported | Enabled By Default | Configuration Responsibility |
---|---|---|
True | False | Customer |
Feature notes: Azure Databricks also supports a secret scope stored in (backed by) an encrypted database owned and managed by Azure Databricks.
Configuration Guidance: Ensure that secrets and credentials are stored in secure locations such as Azure Key Vault, instead of embedding them into code or configuration files.
Reference: Key Vault Integration in Databricks
Privileged access
For more information, see the Microsoft cloud security benchmark: Privileged access.
PA-7: Follow just enough administration (least privilege) principle
Features
Azure RBAC for Data Plane
Description: Azure Role-Based Access Control (Azure RBAC) can be used to managed access to service's data plane actions. Learn more.
Supported | Enabled By Default | Configuration Responsibility |
---|---|---|
True | True | Microsoft |
Feature notes: You can use Azure Databricks SCIM APIs to manage users in an Azure Databricks workspace and grant administrative privileges to designated users.
In Azure Databricks, you can use access control lists (ACLs) to configure permission to access different workspace objects.
Configuration Guidance: No additional configurations are required as this is enabled on a default deployment.
Reference: How to manage access control in Azure Databricks
PA-8: Determine access process for cloud provider support
Features
Customer Lockbox
Description: Customer Lockbox can be used for Microsoft support access. Learn more.
Supported | Enabled By Default | Configuration Responsibility |
---|---|---|
True | False | Customer |
Configuration Guidance: In support scenarios where Microsoft needs to access your data, use Customer Lockbox to review, then approve or reject each of Microsoft's data access requests.
Reference: Customer Lockbox
Data protection
For more information, see the Microsoft cloud security benchmark: Data protection.
DP-3: Encrypt sensitive data in transit
Features
Data in Transit Encryption
Description: Service supports data in-transit encryption for data plane. Learn more.
Supported | Enabled By Default | Configuration Responsibility |
---|---|---|
True | False | Customer |
Feature notes: By default, the data exchanged between worker nodes in a cluster is not encrypted. If your environment requires that data be encrypted at all times, you can create an init script that configures your clusters to encrypt traffic between worker nodes.
Configuration Guidance: Enable secure transfer in services where there is a native data in transit encryption feature built in. Enforce HTTPS on any web applications and services and ensure TLS v1.2 or later is used. Legacy versions such as SSL 3.0, TLS v1.0 should be disabled. For remote management of Virtual Machines, use SSH (for Linux) or RDP/TLS (for Windows) instead of an unencrypted protocol.
Reference: Data in transit Encryption for Databricks
DP-4: Enable data at rest encryption by default
Features
Data at Rest Encryption Using Platform Keys
Description: Data at-rest encryption using platform keys is supported, any customer content at rest is encrypted with these Microsoft managed keys. Learn more.
Supported | Enabled By Default | Configuration Responsibility |
---|---|---|
True | True | Microsoft |
Configuration Guidance: No additional configurations are required as this is enabled on a default deployment.
Reference: Data at rest encryption using platform managed keys in Databricks
DP-5: Use customer-managed key option in data at rest encryption when required
Features
Data at Rest Encryption Using CMK
Description: Data at-rest encryption using customer-managed keys is supported for customer content stored by the service. Learn more.
Supported | Enabled By Default | Configuration Responsibility |
---|---|---|
True | False | Customer |
Feature notes: Azure Databricks has two customer-managed key features for different types of data.
Customer-managed keys for encryption
Configuration Guidance: If required for regulatory compliance, define the use case and service scope where encryption using customer-managed keys are needed. Enable and implement data at rest encryption using customer-managed key for those services.
Reference: Data at Rest Encryption Using CMK in Databricks
DP-6: Use a secure key management process
Features
Key Management in Azure Key Vault
Description: The service supports Azure Key Vault integration for any customer keys, secrets, or certificates. Learn more.
Supported | Enabled By Default | Configuration Responsibility |
---|---|---|
True | False | Customer |
Feature notes: Note, you cannot use an Azure Databricks personal access token or an Azure AD application token that belongs to a service principal.
Configuration Guidance: Use Azure Key Vault to create and control the life cycle of your encryption keys, including key generation, distribution, and storage. Rotate and revoke your keys in Azure Key Vault and your service based on a defined schedule or when there is a key retirement or compromise. When there is a need to use customer-managed key (CMK) in the workload, service, or application level, ensure you follow the best practices for key management: Use a key hierarchy to generate a separate data encryption key (DEK) with your key encryption key (KEK) in your key vault. Ensure keys are registered with Azure Key Vault and referenced via key IDs from the service or application. If you need to bring your own key (BYOK) to the service (such as importing HSM-protected keys from your on-premises HSMs into Azure Key Vault), follow recommended guidelines to perform initial key generation and key transfer.
Reference: Key management in Databricks
Asset management
For more information, see the Microsoft cloud security benchmark: Asset management.
AM-2: Use only approved services
Features
Azure Policy Support
Description: Service configurations can be monitored and enforced via Azure Policy. Learn more.
Supported | Enabled By Default | Configuration Responsibility |
---|---|---|
True | False | Customer |
Configuration Guidance: Use Microsoft Defender for Cloud to configure Azure Policy to audit and enforce configurations of your Azure resources. Use Azure Monitor to create alerts when there is a configuration deviation detected on the resources. Use Azure Policy [deny] and [deploy if not exists] effects to enforce secure configuration across Azure resources.
Reference: Databricks Azure Policy
Logging and threat detection
For more information, see the Microsoft cloud security benchmark: Logging and threat detection.
LT-1: Enable threat detection capabilities
Features
Microsoft Defender for Service / Product Offering
Description: Service has an offering-specific Microsoft Defender solution to monitor and alert on security issues. Learn more.
Supported | Enabled By Default | Configuration Responsibility |
---|---|---|
False | Not Applicable | Not Applicable |
Configuration Guidance: This feature is not supported to secure this service.
LT-4: Enable logging for security investigation
Features
Azure Resource Logs
Description: Service produces resource logs that can provide enhanced service-specific metrics and logging. The customer can configure these resource logs and send them to their own data sink like a storage account or log analytics workspace. Learn more.
Supported | Enabled By Default | Configuration Responsibility |
---|---|---|
True | False | Customer |
Configuration Guidance: For audit logging, Azure Databricks provides comprehensive end-to-end diagnostic logs of activities performed by Azure Databricks users, allowing your enterprise to monitor detailed Azure Databricks usage patterns.
Note: that Azure Databricks diagnostic logs require the Azure Databricks Premium Plan.
How to enable Diagnostic Settings for Azure Activity Log
How to enable Diagnostic Settings for Azure Databricks
Reference: Resource logs in Databricks
Posture and vulnerability management
For more information, see the Microsoft cloud security benchmark: Posture and vulnerability management.
PV-3: Define and establish secure configurations for compute resources
Features
Other guidance for PV-3
When you create an Azure Databricks cluster, it spins up base VM images. User code is run within containers that are deployed on the VMs. Implement a third-party vulnerability management solution. If you have a vulnerability management platform subscription, you may use Azure Databricks initialization scripts, running in the containers on each of the nodes, to install vulnerability assessment agents on your Azure Databricks cluster nodes, and manage the nodes through the respective portal. Note that every third-party solution works differently.
Databricks cluster node initialization scripts
Backup and recovery
For more information, see the Microsoft cloud security benchmark: Backup and recovery.
BR-1: Ensure regular automated backups
Features
Azure Backup
Description: The service can be backed up by the Azure Backup service. Learn more.
Supported | Enabled By Default | Configuration Responsibility |
---|---|---|
False | Not Applicable | Not Applicable |
Configuration Guidance: This feature is not supported to secure this service.
Service Native Backup Capability
Description: Service supports its own native backup capability (if not using Azure Backup). Learn more.
Supported | Enabled By Default | Configuration Responsibility |
---|---|---|
True | False | Customer |
Feature notes: For your Azure Databricks data sources, ensure you have configured an appropriate level of data redundancy for your use case. For example, if using an Azure Storage account for your Azure Databricks data store, choose the appropriate redundancy option (LRS, ZRS, GRS, RA-GRS).
Data sources for Azure Databricks
Configuration Guidance: There is no current Microsoft guidance for this feature configuration. Please review and determine if your organization wants to configure this security feature.
Reference: Regional disaster recovery for Azure Databricks clusters
Next steps
- See the Microsoft cloud security benchmark overview
- Learn more about Azure security baselines