Hi Relay Thanks for the clarification since you’ve already created the error_logs Delta table, here are some practical recommendations for folder structure and implementation within your Databricks CI Satellite EDLAP:
Folder Structure (in ADLS Gen2 Silver Layer)
Recommended structure for organizing logs:
/mnt/silver/logs/error_logs/ --> Main Delta table
/mnt/silver/logs/error_logs/archive/ --> Optional: Archive old logs
/mnt/silver/logs/error_logs/temp/ --> Optional: Temp/staging
You can register the main path as a Delta table and manage archiving via time-based filters (e.g., partitioning by date).
What to Log (Parameters)
You’re on the right track. Suggested fields:
-
timestamp -
pipeline_name,step_name -
error_message,error_type -
source_file,table_name -
row_data(JSON string of failed row) -
run_idorjob_id
Use partitionBy("date") if you're expecting large log volume.
Capturing Errors in PySpark
Wrap your processing steps with try-except blocks and append to the Delta log table:
try:
# Transformation logic
except Exception as e:
log_df = spark.createDataFrame([{
"timestamp": datetime.now(),
"pipeline_name": "CI_EDLAP",
"step_name": "transform_step_1",
"error_message": str(e),
"error_type": type(e).__name__,
"source_file": "file.json",
"table_name": "target_table",
"row_data": row_json,
"run_id": run_id
}])
log_df.write.mode("append").format("delta").save("/mnt/silver/logs/error_logs/")
Hope this helps. If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.