HDFS encryption zones usage guide in SQL Server Big Data Clusters

Applies to: SQL Server 2019 (15.x)

Important

The Microsoft SQL Server 2019 Big Data Clusters add-on will be retired. Support for SQL Server 2019 Big Data Clusters will end on February 28, 2025. All existing users of SQL Server 2019 with Software Assurance will be fully supported on the platform and the software will continue to be maintained through SQL Server cumulative updates until that time. For more information, see the announcement blog post and Big data options on the Microsoft SQL Server platform.

This article shows how to use the encryption at rest capabilities of SQL Server Big Data Clusters to encrypt HDFS folders using Encryption Zones. It also describes HDFS key management tasks.

A default encryption zone, at /securelake, is ready to be used. It was created with a system generated 256-bit key named securelakekey. This key can be used to create other encryption zones.

Prerequisites

Create an encryption zone using the provided system managed key

  1. Create your HDFS folder by using this azdata command:

    azdata bdc hdfs mkdir --path /user/zone/folder
    
  2. Issue the encryption zone create command to encrypt the folder using the securelakekey key.

    azdata bdc hdfs encryption-zone create --path /user/zone/folder --keyname securelakekey
    

Manage encryption zones when using external providers

For more information on the way key versions are used on SQL Server Big Data Clusters encryption at rest, see Main key rotation for HDFS for an end-to-end example of how to manage encryption zones when using external key providers.

Create a custom new key and encryption zone

  1. Use the following pattern to create a 256-bit key.

    azdata bdc hdfs key create --name mydatalakekey
    
  2. Create and encrypt a new HDFS path using the user key.

    azdata bdc hdfs encryption-zone create --path /user/mydatalake --keyname mydatalakekey
    

HDFS Key rotation and encryption zone re-encryption

  1. This approach creates a new version of the securelakekey with new key material.

    azdata hdfs bdc key roll --name securelakekey
    
  2. Re-encrypt the encryption zone associated with the key above.

    azdata bdc hdfs encryption-zone reencrypt --path /securelake --action start
    

HDFS Key and encryption zone monitoring

  • To monitor the status of an encryption zone re-encryption, use this command:

    azdata bdc hdfs encryption-zone status
    
  • To get the encryption information about a file in an encryption zone, use this command:

    azdata bdc hdfs encryption-zone get-file-encryption-info --path /securelake/data.csv
    
  • To list all encryption zones, use this command:

    azdata bdc hdfs encryption-zone list
    
  • To list all the available keys for HDFS, use this command:

    azdata bdc hdfs key list
    
  • To create a custom key for HDFS encryption, use this command:

    azdata hdfs key create --name key1 --size 256
    

    Possible sizes are 128, 192 256. The default is 256.

Next steps

Use azdata with Big Data Clusters, see Introducing SQL Server 2019 Big Data Clusters.

To use an external key provider for encryption at rest, see External Key Providers in SQL Server Big Data Clusters.