To achieve the steps you mentioned using Azure Data Factory (ADF), you can follow these guidelines:
Create Linked Services:
- Create a Linked Service for your source Azure SQL DB to establish a connection.
- Create a Linked Service for your target Azure Data Lake Storage (ADLS) to define the destination.
Create Datasets:
- Create a dataset for your source Azure SQL DB, specifying the table(s) you want to extract data from.
- Create a dataset for your target ADLS, defining the folder structure and partitioning scheme.
Create Pipelines:
- Create a pipeline to extract data from the source Azure SQL DB. Use the Copy Activity to copy data from the source dataset to the target ADLS dataset.
- Configure the Copy Activity to use a dynamic file path for partitioning based on Year, Month, and Day. You can use expressions like
@concat('/year=', formatDateTime(utcnow(), 'yyyy'), '/month=', formatDateTime(utcnow(), 'MM'), '/day=', formatDateTime(utcnow(), 'dd'))
.
Add Activities for Truncation and Load:
- To truncate and load the data for the current and previous week, you can add a Lookup Activity before the Copy Activity in the pipeline.
- In the Lookup Activity, query the target ADLS to retrieve the latest date for which data is already present.
- Use the retrieved date to filter the source data in the Copy Activity using a WHERE clause. For example,
WHERE dateColumn >= @activity('LookupActivity').output.firstRow.latestDate
.
Schedule and Execute the Pipeline:
- Set up a trigger or schedule for the pipeline to run at the desired frequency.
- Execute the pipeline to start the data extraction, transformation, and loading process.
By following these steps, you can query data from the source Azure SQL DB, partition the files in ADLS based on the specified timestamp column, and truncate/load data for the current and previous week using Azure Data Factory.