Store data in ADLS by partitioning files based on Year Month and Day using ADF

Question

Store data in ADLS by partitioning files based on Year Month and Day using ADF

Praveenraj R K 61 Microsoft Employee

Hi All,

I need to perform the below steps. Please advice how to achieve these steps ?

I need to query data from source azure SQL DB (data extraction from sql tables and no modification to be done in source system like creating stored procedures,etc..) and store data in ADLS by partitioning files (partition on the timestamp data available in tables and not on the utc datetime) based on Year, Month and Day using ADF. Could you please advice how to achieve this in ADF ?
Also need to truncate and load the data in ADLS for the current and previous week data alone using ADF.

1 answer

Your answer

Answer 1

To achieve the steps you mentioned using Azure Data Factory (ADF), you can follow these guidelines:

Create Linked Services:

Create a Linked Service for your source Azure SQL DB to establish a connection.
Create a Linked Service for your target Azure Data Lake Storage (ADLS) to define the destination.

Create Datasets:

Create a dataset for your source Azure SQL DB, specifying the table(s) you want to extract data from.
Create a dataset for your target ADLS, defining the folder structure and partitioning scheme.

Create Pipelines:

Create a pipeline to extract data from the source Azure SQL DB. Use the Copy Activity to copy data from the source dataset to the target ADLS dataset.
Configure the Copy Activity to use a dynamic file path for partitioning based on Year, Month, and Day. You can use expressions like @concat('/year=', formatDateTime(utcnow(), 'yyyy'), '/month=', formatDateTime(utcnow(), 'MM'), '/day=', formatDateTime(utcnow(), 'dd')).

Add Activities for Truncation and Load:

To truncate and load the data for the current and previous week, you can add a Lookup Activity before the Copy Activity in the pipeline.
In the Lookup Activity, query the target ADLS to retrieve the latest date for which data is already present.
Use the retrieved date to filter the source data in the Copy Activity using a WHERE clause. For example, WHERE dateColumn >= @activity('LookupActivity').output.firstRow.latestDate.

Schedule and Execute the Pipeline:

Set up a trigger or schedule for the pipeline to run at the desired frequency.
Execute the pipeline to start the data extraction, transformation, and loading process.

By following these steps, you can query data from the source Azure SQL DB, partition the files in ADLS based on the specified timestamp column, and truncate/load data for the current and previous week using Azure Data Factory.

Praveenraj R K 61 Reputation points Microsoft Employee

2023-05-23T14:21:50.8166667+00:00

Hi Vasim,

Thanks for the response. I need to partition the folders based on the data available from the date column by Year, Month and Day and not on the UTC date.

Configure the Copy Activity to use a dynamic file path for partitioning based on Year, Month, and Day. You can use expressions like @concat('/year=', formatDateTime(utcnow(), 'yyyy'), '/month=', formatDateTime(utcnow(), 'MM'), '/day=', formatDateTime(utcnow(), 'dd')).

Also, we don't have any daily timestamp or date column. We only have FirstDayofWeek column which has data for whole week data written as FirstDayofWeek. Also, we need to truncate the last and current week data and load again current week and last week data on daily basis.

In the Lookup Activity, query the target ADLS to retrieve the latest date for which data is already present.

Use the retrieved date to filter the source data in the Copy Activity using a WHERE clause. For example, WHERE dateColumn >= @activity('LookupActivity').output.firstRow.latestDate.
AnnuKumari-MSFT 34,556 Reputation points Microsoft Employee Moderator

2023-06-07T09:49:27.0366667+00:00

Praveenraj Ragupathi ,

Could you please share the screenshot of how the table looks like, which is the column you want to be dependent on and how the sample dates look like on the basis of which you want to have the partitioning. Thankyou

Share via

Store data in ADLS by partitioning files based on Year Month and Day using ADF

1 answer

Your answer