Azure Databricks and Key vault key versioning

Hemanth Kumar 0 Reputation points
2025-01-28T07:07:43.85+00:00

We are building a Data encryption solution and are using pyspark AES encrypt and AES Decrypt function to encrypt the PII columns in the Databricks tables on Unity catalog. We have created views on top of these tables and using the same encryption key we have the decrypted data. This Data is only accessible to limited users based on the ACLs. The encryption key is stored on Azure key vault and we are accessing it through Azure keyvault backed Databricks secret scope. Now, the client has a policy to rotate the secret after one year. If we rotate the secret and update the value, we won't be able to view all the back dated data which has been decrypted using the older secret. We are using external tables as we have to join the different tables based on the PII columns. We cannot go with managed tables or row-level masking or column level masking. What is the recommended and best approach for secret rotation and management to proceed further?

Azure Key Vault
Azure Key Vault
An Azure service that is used to manage and protect cryptographic keys and other secrets used by cloud apps and services.
1,368 questions
Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
2,323 questions
0 comments No comments
{count} votes

2 answers

Sort by: Most helpful
  1. Chiugo Okpala 75 Reputation points MVP
    2025-02-02T21:49:02.7166667+00:00

    @Hemanth Kumar Key rotation is indeed a critical aspect to ensure security, but it can be challenging when dealing with historical data. Here are some recommended approaches and best practices for secret rotation and management:

    1. Key Versioning

    Azure Key Vault supports key versioning, which allows you to keep multiple versions of a key. When you rotate the key, you can create a new version of the key while retaining the old version. This way, you can still decrypt historical data with the old key version.

    1. Dual-Key Encryption

    Use a dual-key encryption approach where each piece of data is encrypted with two keys: a data encryption key (DEK) and a key encryption key (KEK). The DEK is used for encrypting the data, and the KEK is used to encrypt the DEK. When you rotate the KEK, you re-encrypt the DEK with the new KEK. This way, you can still decrypt historical data with the old DEK.

    1. Automated Key Rotation

    Leverage Azure Key Vault's automated key rotation feature. This feature allows you to set an expiration time for the key and automatically rotates the key when it expires. You can also configure notifications to alert you when the key is nearing expiration.

    1. Centralized Key Management

    Use a centralized key management system (KMS) to manage your keys. This system should be secure and allow easy management of keys across the organization. It should also support key versioning and dual-key encryption.

    1. Regular Audits and Monitoring

    Conduct regular audits and monitor key usage to ensure that keys are being used correctly and securely. This can help you identify any potential issues and address them promptly.

    1. Disaster Recovery Strategy

    Create a disaster recovery strategy that includes backing up your keys and ensuring that they can be restored in case of a disaster. This can help you recover your data if something goes wrong.

    1. Documentation and Policies

    Document your key management policies and enforce them consistently. This includes documenting key usage, rotation schedules, and any other relevant information.

    By implementing these practices, you can ensure that your data encryption solution remains secure while still allowing access to historical data. Does this help address your concerns?

    See:

    https://learn.microsoft.com/en-us/azure/key-vault/general/versions

    https://www.encryptionconsulting.com/10-enterprise-encryption-key-management-best-practices/

    https://learn.microsoft.com/en-us/azure/key-vault/keys/how-to-configure-key-rotation

    https://www.liquidweb.com/blog/encryption-key-management-best-practices/


  2. Chiugo Okpala 75 Reputation points MVP
    2025-02-06T07:07:57.66+00:00

    @Hemanth kumar

    Ok. Let's look into using the Databricks Fernet approach for encrypting and decrypting PII data.

    Understanding Databricks Fernet:

    Fernet is a symmetric encryption library from the Python Cryptography Authority (PyCA). It provides a simple way to encrypt and decrypt data using a secret key. Here's how you can use it in Databricks:

    Step-by-Step Guide:

    1. Install Fernet:
      • First, install the Fernet library if you haven't already:
             pip install cryptography
        
    2. Generate a Fernet Key:
      • Generate a Fernet key and store it securely:
             from cryptography.fernet import Fernet
             key = Fernet.generate_key()
             print(key)
        
      • Store this key in Databricks Secrets for secure access.
    3. Create Encryption and Decryption Functions:
      • Define UDFs (User-Defined Functions) for encryption and decryption:
             from cryptography.fernet import Fernet
             from pyspark.sql.functions import udf, lit
             from pyspark.sql.types import StringType
             def encrypt_val(clear_text, MASTER_KEY):
                 f = Fernet(MASTER_KEY)
                 clear_text_b = bytes(clear_text, 'utf-8')
                 cipher_text = f.encrypt(clear_text_b)
                 return str(cipher_text.decode('ascii'))
             def decrypt_val(cipher_text, MASTER_KEY):
                 f = Fernet(MASTER_KEY)
                 clear_val = f.decrypt(cipher_text.encode()).decode()
                 return clear_val
             encrypt = udf(encrypt_val, StringType())
             decrypt = udf(decrypt_val, StringType())
             
             
             
        
    4. Fetch the Key from Databricks Secrets:
      • Retrieve the key from Databricks Secrets:
             encryptionKey = dbutils.secrets.get(scope="encrypt", key="fernetkey")
        
    5. Encrypt Data:
      • Use the encryption UDF to encrypt the PII columns in your DataFrame:
             df = spark.table("Test_Encryption")
             encrypted = df.withColumn("ssn", encrypt("ssn", lit(encryptionKey)))
             encrypted.write.format("delta").mode("overwrite").option("overwriteSchema", "true").saveAsTable("Test_Encryption_Table")
        
    6. Decrypt Data:
      • Use the decryption UDF to decrypt the data when needed:
             decrypted = encrypted.withColumn("ssn", decrypt("ssn", lit(encryptionKey)))
             decrypted.show()
        
      Best Practices:
      • Key Rotation: Regularly rotate your encryption keys and update the key in Databricks Secrets. This ensures that even if a key is compromised, only a limited amount of data is affected.
      • Access Control: Use Databricks Secrets to manage access to the encryption keys, ensuring only authorized users can retrieve and use them.
      • Audit Logs: Enable audit logs to monitor access and usage of the encryption keys and encrypted data.
      By following these steps and best practices, you can effectively secure your PII data using the Databricks Fernet approach. Does this help clarify things for you?

    See Microsoft Azure documentation:

    https://www.comparitech.com/blog/information-security/what-is-fernet/

    https://www.databricks.com/blog/2020/11/20/enforcing-column-level-encryption-and-avoiding-data-duplication-with-pii.html

    https://www.databricks.com/notebooks/enforcing-column-level-encryption.html

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.