Customer-managed keys for encryption
This article provides an overview of customer-managed keys for encryption.
Note
This feature requires the Premium plan.
Customer-managed keys for encryption overview
Some services and data support adding a customer-managed key to help protect and control access to encrypted data. You can use the key management service in your cloud to maintain a customer-managed encryption key.
Azure Databricks supports customer-managed keys from Azure Key Vault vaults and Azure Key Vault Managed HSM (Hardware Security Modules).
Azure Databricks has three customer-managed key features for different types of data:
- Customer-managed keys for Azure managed disks
- Customer-managed keys for managed services
- Customer-managed keys for DBFS root
The following table lists which customer-managed key features are used for which types of data.
Type of data | Location | Customer-managed key feature |
---|---|---|
Notebook source and metadata | Control plane | Managed services |
Personal access tokens (PAT) or other credentials used for Git integration with Databricks Git folders | Control plane | Managed services |
Secrets stored by the secret manager APIs | Control plane | Managed services |
Databricks SQL queries and query history | Control plane | Managed services |
Vector Search indexes and metadata | Serverless compute plane | Managed services |
Customer-accessible DBFS root data | Your workspace’s DBFS root in your workspace storage account in your Azure subscription. This also includes the FileStore area. | DBFS root |
Job results | Workspace storage account in your Azure subscription | DBFS root |
Databricks SQL results | Workspace storage account in your Azure subscription | DBFS root |
MLflow Models | Workspace storage account in your Azure subscription | DBFS root |
Delta Live Table | If you use a DBFS path in your DBFS root, this is stored in your workspace storage account in your Azure subscription. This does not apply to DBFS paths that represent mount points to other data sources. | DBFS root |
Interactive notebook results | By default, when you run a notebook interactively (rather than as a job) results are stored in the control plane for performance with some large results stored in your workspace storage account in your Azure subscription. You can choose to configure Azure Databricks to store all interactive notebook results in your workspace storage account. See Configure the storage location for interactive notebook results. | For partial results in the control plane, use a customer-managed key for managed services. For results in the workspace storage account, which you can configure for all result storage, use a customer-managed key for DBFS root. |
Other workspace system data in the workspace storage account that is inaccessible through DBFS, such as notebook revisions. | Workspace storage account in your Azure subscription | DBFS root |
Managed disks | Temporary disk storage of VMs in compute resources such as clusters. Applies only to compute resources in the classic compute plane in your Azure subscription. See Serverless compute and customer-managed keys. | Managed disks |
For additional security for your workspace storage account instance in your Azure subscription, you can enable double encryption and firewall support. See Configure double encryption for DBFS root and Enable firewall support for your workspace storage account.
Serverless compute and customer-managed keys
Databricks SQL Serverless supports:
Customer-managed keys for managed services for Databricks SQL queries and query history.
Customer-managed keys for DBFS root storage for Databricks SQL results.
Customer-managed keys for managed disk storage do not apply to serverless compute resources. Disks for serverless compute resources are short-lived and tied to the lifecycle of the serverless workload. When compute resources are stopped or scaled down, the VMs and their storage are destroyed.
Model Serving
Resources for Model Serving, a serverless compute feature, are generally in two categories:
- Resources that you create for the model are stored in your workspace’s DBFS root in your workspace storage in ADLSgen2 (for older workspaces, Blob storage). This includes the model’s artifacts and version metadata. Both the workspace model registry and MLflow use this storage. You can configure this storage to use customer-managed keys.
- Resources that Azure Databricks creates directly on your behalf include the model image and ephemeral serverless compute storage. These are encrypted with Databricks-managed keys and do not support customer-managed keys.
Customer-managed keys for managed disk storage do not apply to serverless compute resources. Disks for serverless compute resources are short-lived and tied to the lifecycle of the serverless workload. When compute resources are stopped or scaled down, the VMs and their storage are destroyed.