Best Practice for Loading Data into Raw Layer in Azure Data Lake to SQL Pipeline (Truncate vs Append)

Question

Best Practice for Loading Data into Raw Layer in Azure Data Lake to SQL Pipeline (Truncate vs Append)

YERNAIDU SIRAPARAPU 45

We’re using a layered architecture (Raw → Clean → Aggregate) to load data from Azure Data Lake Storage (ADLS) into Azure SQL using Azure Data Factory (ADF).

For the Raw layer, which is intended to store ingested data as-is from source files (e.g., CSV/Excel), I’m trying to decide between two loading strategies for each pipeline run:

Option 1: Truncate the Raw table before each load and insert the new file’s data.

Option 2: Append the new file’s data to the existing Raw table on every run.

Which of these approaches is recommended or considered best practice in a production data pipeline — especially with respect to auditability, reprocessing, and data traceability?

Any insights or architectural recommendations would be appreciated!

Accepted answer

0 additional answers

Your answer

Answer 1

would you plan to preserve the files in your ADLS ?
If yes, then there is no need to append the data into stage table, better truncate insert and then use it for further processing to avoid duplicate data.

Other way would be for you to use external table feature which is in preview in Azure SQL database:

https://learn.microsoft.com/en-us/sql/t-sql/statements/create-external-table-transact-sql?view=sql-server-ver17&tabs=dedicated

wherein you can read CSV files from ADLS directly without the need to load them physically in the Az SQL Database.

Share via

Best Practice for Loading Data into Raw Layer in Azure Data Lake to SQL Pipeline (Truncate vs Append)

0 additional answers

Your answer