HDFS encryption zones usage guide in SQL Server Big Data Clusters
Applies to: SQL Server 2019 (15.x)
Important
The Microsoft SQL Server 2019 Big Data Clusters add-on will be retired. Support for SQL Server 2019 Big Data Clusters will end on February 28, 2025. All existing users of SQL Server 2019 with Software Assurance will be fully supported on the platform and the software will continue to be maintained through SQL Server cumulative updates until that time. For more information, see the announcement blog post and Big data options on the Microsoft SQL Server platform.
This article shows how to use the encryption at rest capabilities of SQL Server Big Data Clusters to encrypt HDFS folders using Encryption Zones. It also describes HDFS key management tasks.
A default encryption zone, at /securelake, is ready to be used. It was created with a system generated 256-bit key named securelakekey
. This key can be used to create other encryption zones.
Prerequisites
- SQL Server Big Data Cluster CU8+ with Active Directory Integration.
- SQL Server Big Data Clusters user with Kubernetes administrative privileges, a member of the clusterAdmins role. For more information, see Manage big data cluster access in Active Directory mode.
- Azure Data CLI (
azdata
) configured and logged into the cluster in AD mode.
Create an encryption zone using the provided system managed key
Create your HDFS folder by using this azdata command:
azdata bdc hdfs mkdir --path /user/zone/folder
Issue the encryption zone create command to encrypt the folder using the
securelakekey
key.azdata bdc hdfs encryption-zone create --path /user/zone/folder --keyname securelakekey
Manage encryption zones when using external providers
For more information on the way key versions are used on SQL Server Big Data Clusters encryption at rest, see Main key rotation for HDFS for an end-to-end example of how to manage encryption zones when using external key providers.
Create a custom new key and encryption zone
Use the following pattern to create a 256-bit key.
azdata bdc hdfs key create --name mydatalakekey
Create and encrypt a new HDFS path using the user key.
azdata bdc hdfs encryption-zone create --path /user/mydatalake --keyname mydatalakekey
HDFS Key rotation and encryption zone re-encryption
This approach creates a new version of the
securelakekey
with new key material.azdata hdfs bdc key roll --name securelakekey
Re-encrypt the encryption zone associated with the key above.
azdata bdc hdfs encryption-zone reencrypt --path /securelake --action start
HDFS Key and encryption zone monitoring
To monitor the status of an encryption zone re-encryption, use this command:
azdata bdc hdfs encryption-zone status
To get the encryption information about a file in an encryption zone, use this command:
azdata bdc hdfs encryption-zone get-file-encryption-info --path /securelake/data.csv
To list all encryption zones, use this command:
azdata bdc hdfs encryption-zone list
To list all the available keys for HDFS, use this command:
azdata bdc hdfs key list
To create a custom key for HDFS encryption, use this command:
azdata hdfs key create --name key1 --size 256
Possible sizes are 128, 192 256. The default is 256.
Next steps
Use azdata
with Big Data Clusters, see Introducing SQL Server 2019 Big Data Clusters.
To use an external key provider for encryption at rest, see External Key Providers in SQL Server Big Data Clusters.