Hello Sourav, from what you explained, you want to secure transfer of files from ADLS to the AWS S3 bucket for your SAS application (both at rest and in transit).
You need to set up the encryption at rest for both ADLS (using Azure Storage Service Encryption ) and AWS S3 (either AWS-managed keys (SSE-S3) or AWS KMS-managed keys (SSE-KMS))
Since ADF does not support AWS S3 as a sink directly, we can use the following method:
- Create a linked service in ADF for your ADLS
- Use Databricks to temporarily stage the data before transferring it to S3
If you are opting for Databricks for batch processing and encryption:
- GPG Key Creation for encryption and decryption and store the public key securely in Databricks and the private key securely in AWS
- Databricks Notebook to read data from ADLS, encrypt the data using the GPG public key and save the encrypted data to a staging area in Databricks
from cryptography.fernet import Fernet
# Generate a key for encryption
key = Fernet.generate_key()
cipher_suite = Fernet(key)
# Example function to encrypt data
def encrypt_data(data):
encrypted_data = cipher_suite.encrypt(data)
return encrypted_data
# Reading data from ADLS
data = dbutils.fs.head("dbfs:/mnt/adls/path/to/data.csv")
# Encrypting the data
encrypted_data = encrypt_data(data.encode())
# Save the encrypted data
dbutils.fs.put("/mnt/databricks/staging/encrypted_data.csv", encrypted_data.decode())
Then you need to install AWS CLI in the Databricks cluster and use Databricks to upload the encrypted files to S3
import os
# Set AWS credentials
os.environ['AWS_ACCESS_KEY_ID'] = 'your_access_key_id'
os.environ['AWS_SECRET_ACCESS_KEY'] = 'your_secret_access_key'
# Upload file to S3
dbutils.fs.cp("dbfs:/mnt/databricks/staging/encrypted_data.csv", "s3://your-s3-bucket/encrypted_data.csv")
You can use ADF to create a pipeline that triggers the Databricks notebook at your desired schedule and the GPG private key to decrypt data once it is in S3.
# Decrypt the file using GPG
gpg --decrypt --output decrypted_data.csv s3://your-s3-bucket/encrypted_data.csv
More links :
https://www.geeksforgeeks.org/how-do-i-add-a-s3-bucket-to-databricks/
https://community.boomi.com/s/article/Inserting-Data-into-Databricks-with-AWS-S3
https://learn.microsoft.com/en-us/azure/security/fundamentals/encryption-atrest
https://learn.microsoft.com/en-us/azure/storage/common/storage-service-encryption
https://learn.microsoft.com/en-us/azure/security/fundamentals/encryption-overview