Hello @Naga
A simplistic approach for masking data while reading from a set of CSV files from a storage is to
- Create a masking function (python / scala)
- Register the function as a spark UDF
- Use spark.read or spark.readStream with selectExpr containing the UDF to load data to a Data Frame
- Save the data to a table
Below sample code could help you to read all CSVs a storage account path to a spark database table.
from pyspark.sql import SparkSession
from pyspark.sql.functions import udf
import hashlib
class Mask:
def __init__(self, salt: str):
self.salt = salt
def sha512(self, value):
return hashlib.sha512(f'{value}{self.salt}'.encode()).hexdigest()
def shake_128(self, value):
return hashlib.shake_128(f'{value}{self.salt}'.encode()).hexdigest(32)
def register(self, spark: SparkSession):
spark.udf.register('sha512', self.sha512)
spark.udf.register('shake128', self.shake_128)
Create the Spark Session, set config to read from storage, and register UDFs.
spark = SparkSession.builder.getOrCreate()
spark.conf.set(f'fs.azure.account.key.{<my_storage>}.blob.core.windows.net', '<my_storage_key>')
path = f'wasbs://{<my_container>}@{<my_storage>}.blob.core.windows.net/*.csv'
m= Mask('123456789')
m.register(spark)
Now, use following code to read source files and save it to a database table
spark.read \
.format('csv') \
.option('inferSchema', True) \
.option('header', True) \
.load(path) \
.selectExpr(['user_name', 'shake128(password)']) \
.write \
.mode('append') \
.saveAsTable('my_table')
To run the above code and see it working,
- Go to the storage account <my_storage>
- Open container <my_container>
- Upload some csv files to the folder with columns user_name and password with some values
- Copy above code (all code block could be in same cell or different) to a databricks python notebook
- Put the correct storage account name, container name and AccountKey in the above place holders <>
- Run all cells in the order
- In a new cell, run the following code
-
%sql SELECT * FROM my_table
- You should see data displayed
Thanks,
Shalvin