Data security and encryption

This article introduces data security configurations to help protect your data.

For information about securing access to your data, see Data governance with Unity Catalog.

Overview of data security and encryption

Azure Databricks provides encryption features to help protect your data. Not all security features are available on all pricing tiers. The following table contains an overview of the features and how they align to pricing plans.

Feature Pricing tier
Customer-managed keys for encryption Premium
Encrypt traffic between cluster worker nodes Premium
Double encryption for DBFS root Premium
Encrypt queries, query history, and query results Premium

Enable customer-managed keys for encryption

Azure Databricks supports adding a customer-managed key to help protect and control access to data. Azure Databricks supports customer-managed keys from Azure Key Vault vaults and Azure Key Vault Managed Hardware Security Modules (HSMs). There are three customer-managed key features for different types of data:

  • Customer-managed keys for managed disks: Azure Databricks compute workloads in the compute plane store temporary data on Azure managed disks. By default, data stored on managed disks is encrypted at rest using server-side encryption with Microsoft-managed keys. You can configure your own key for your Azure Databricks workspace to use for managed disk encryption. See Customer-managed keys for Azure managed disks.

  • Customer-managed keys for managed services: Managed services data in the Azure Databricks control plane is encrypted at rest. You can add a customer-managed key for managed services to help protect and control access to the following types of encrypted data:

    • Notebook source files that are stored in the control plane.
    • Notebook results for notebooks that are stored in the control plane.
    • Secrets stored by the secret manager APIs.
    • Databricks SQL queries and query history.
    • Personal access tokens or other credentials used to set up Git integration with Databricks Git folders.

    See Customer-managed keys for managed services.

  • Customer-managed keys for DBFS root: By default, the storage account is encrypted with Microsoft-managed keys. You can configure your own key to encrypt all the data in the workspace’s root storage account. For more information, see Customer-managed keys for DBFS root.

For more details of which customer-managed key features in Azure Databricks protect different types kinds of data, see Customer-managed keys for encryption.

Enable double encryption for DBFS

Databricks File System (DBFS) is a distributed file system mounted into an Azure Databricks workspace and available on Azure Databricks clusters. DBFS is implemented as a storage account in your Azure Databricks workspace’s managed resource group. The default storage location in DBFS is known as the DBFS root.

Azure Storage automatically encrypts all data in a storage account, including DBFS root storage. You can optionally enable encryption at the Azure Storage infrastructure level. When infrastructure encryption is enabled, data in a storage account is encrypted twice, once at the service level and once at the infrastructure level, with two different encryption algorithms and two different keys. To learn more about deploying a workspace with infrastructure encryption, see Configure double encryption for DBFS root.

Encrypt queries, query history, and query results

You can use your own key from Azure Key Vault to encrypt the Databricks SQL queries and your query history stored in the Azure Databricks control plane. For more details, see Encrypt queries, query history, and query results

Encrypt traffic between cluster worker nodes

User queries and transformations are typically sent to your clusters over an encrypted channel. By default, however, the data exchanged between worker nodes in a cluster is not encrypted. If your environment requires that data be encrypted at all times, whether at rest or in transit, you can create an init script that configures your clusters to encrypt traffic between worker nodes, using AES 128-bit encryption over a TLS 1.2 connection. For more information, see Encrypt traffic between cluster worker nodes.

Manage workspace settings

Azure Databricks workspace administrators can manage their workspace’s security settings, such as the ability to download notebooks and enforcing the user isolation cluster access mode. For more information, see Manage your workspace.