You can enforce NOT NULL constraints when reading the data back into Spark by explicitly specifying the schema with the nullable property set as required. This won't prevent nulls at the storage level but will cause Spark to throw errors if null values are encountered in columns marked as non-nullable.
// Define the schema with NOT NULL constraints where applicable
val enforcedSchema = StructType(List(
StructField("CurrencyCode", StringType, nullable = false),
StructField("CurrencyName", StringType, nullable = true),
StructField("CurrencyPrecision", IntegerType, nullable = true),
StructField("FractionalPart", IntegerType, nullable = false)
))
// Read the data with the enforced schema
val dfEnforced = spark.read.schema(enforcedSchema).format("parquet").load("path_to_your_data_lake")
Before saving your data, perform data validation to ensure that there are no null values in columns that should be non-nullable. This can be done using DataFrame operations to filter out or fix records that do not meet the schema constraints. Also , you can use the external tables :
CREATE EXTERNAL TABLE your_table_name (
CurrencyCode STRING NOT NULL,
CurrencyName STRING,
CurrencyPrecision INT,
FractionalPart INT NOT NULL
)
WITH (
LOCATION = 'path/to/your/data/in/adls',
DATA_SOURCE = your_external_data_source,
FILE_FORMAT = your_file_format
);