Low Level design for Error Logging for Data Pipeline

Question

Low Level design for Error Logging for Data Pipeline

Relay 320

Hello,

May someone please help me how to design Error Logging for the Data pipeline shown below.

logging

I have to design logging in Databricks CI Satellite EDLAP.

Can I do it in ADLS Gen2 Silver layer or do I need to have any other component.

Can Someone please help me how we can have folder structure and can we have delta table for logging. what all parameter we can log and how it's value can be capture.

I understand:

I can create a separate Delta table like error_logs where i can capture useful details such as: timestamp, table name, pipeline step, error message, source file, and maybe a JSON column to store the problematic row . I may Use try-except blocks in PySpark and append errors into this log table

Any Implementation link will be very helpful, kindly share.

Thanks a lot

Answer accepted by question author

0 additional answers

Your answer

Answer 1

Hi Relay Thanks for the clarification since you’ve already created the error_logs Delta table, here are some practical recommendations for folder structure and implementation within your Databricks CI Satellite EDLAP:

Folder Structure (in ADLS Gen2 Silver Layer)

Recommended structure for organizing logs:


/mnt/silver/logs/error_logs/            --> Main Delta table
/mnt/silver/logs/error_logs/archive/    --> Optional: Archive old logs
/mnt/silver/logs/error_logs/temp/       --> Optional: Temp/staging

You can register the main path as a Delta table and manage archiving via time-based filters (e.g., partitioning by date).

What to Log (Parameters)

You’re on the right track. Suggested fields:

timestamp
pipeline_name, step_name
error_message, error_type
source_file, table_name
row_data (JSON string of failed row)
run_id or job_id

Use partitionBy("date") if you're expecting large log volume.

Capturing Errors in PySpark

Wrap your processing steps with try-except blocks and append to the Delta log table:


try:
    # Transformation logic
except Exception as e:
    log_df = spark.createDataFrame([{
        "timestamp": datetime.now(),
        "pipeline_name": "CI_EDLAP",
        "step_name": "transform_step_1",
        "error_message": str(e),
        "error_type": type(e).__name__,
        "source_file": "file.json",
        "table_name": "target_table",
        "row_data": row_json,
        "run_id": run_id
    }])
    log_df.write.mode("append").format("delta").save("/mnt/silver/logs/error_logs/")

Hope this helps. If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

Share via

Low Level design for Error Logging for Data Pipeline

0 additional answers

Your answer