Best Practice for Loading Data into Raw Layer in Azure Data Lake to SQL Pipeline (Truncate vs Append)

YERNAIDU SIRAPARAPU 45 Reputation points
2025-06-19T02:45:35.9566667+00:00

We’re using a layered architecture (Raw → Clean → Aggregate) to load data from Azure Data Lake Storage (ADLS) into Azure SQL using Azure Data Factory (ADF).

For the Raw layer, which is intended to store ingested data as-is from source files (e.g., CSV/Excel), I’m trying to decide between two loading strategies for each pipeline run:

Option 1: Truncate the Raw table before each load and insert the new file’s data.

Option 2: Append the new file’s data to the existing Raw table on every run.

Which of these approaches is recommended or considered best practice in a production data pipeline — especially with respect to auditability, reprocessing, and data traceability?

Any insights or architectural recommendations would be appreciated!

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
11,623 questions
0 comments No comments
{count} votes

Accepted answer
  1. Nandan Hegde 36,146 Reputation points MVP Volunteer Moderator
    2025-06-19T03:03:05.09+00:00

    would you plan to preserve the files in your ADLS ?
    If yes, then there is no need to append the data into stage table, better truncate insert and then use it for further processing to avoid duplicate data.

    Other way would be for you to use external table feature which is in preview in Azure SQL database:

    https://learn.microsoft.com/en-us/sql/t-sql/statements/create-external-table-transact-sql?view=sql-server-ver17&tabs=dedicated

    wherein you can read CSV files from ADLS directly without the need to load them physically in the Az SQL Database.

    0 comments No comments

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.