How to partition data by column in ADF using Copy Data Activity from a SQL server table?

Vansh Rathore 0 Reputation points
2023-05-02T07:22:12.11+00:00

Hello,

I have a Sql server table (on-premise) named DailyTransactions, I want to partition the data based on the Insertion_Date column and create csv files for each partition and store them in a folder heirarcy like: YYYY/MM/DD/ Data-yyyy-mm-dd.csv

like for the partition of date 25-01-2023, the data to be stored as:
Output/2023/01/25/ Data-2023-01-25.csv

The sample csv file is as:
TransactionData

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
11,643 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Bhargava-MSFT 31,261 Reputation points Microsoft Employee Moderator
    2023-05-02T20:34:29.3766667+00:00

    Hello Vansh Rathore,

    To partition the data based on the Insertion_Date column and create CSV files for each partition; you can use ADF to copy and partition the data from your on-premises SQL Server table. You'll need to create a pipeline with a Copy Data activity that reads from your SQL Server table and writes to Azure Blob Storage in the desired folder hierarchy and file format.

    Here in the source, you need to use the Partition option.

    Use the Physical partition of table option if your table is already partitioned.

    Otherwise, you can use Dynamic range.

    User's image

    https://learn.microsoft.com/en-us/azure/data-factory/connector-azure-sql-database?tabs=data-factory#parallel-copy-from-sql-database

    You can use the below code to partition your table based on the insertion_date. For that, you need to create a partition function and partition scheme that partitions the table by Insertion_Date.

    -- Create the partition function
    CREATE PARTITION FUNCTION pf_Insertion_Date (datetime)
    AS RANGE RIGHT FOR VALUES ('20230101', '20230102', '20230103', '20230104', '20230105', '20230106', '20230107', '20230108', '20230109', '20230110');
    
    -- Create the partition scheme
    CREATE PARTITION SCHEME ps_Insertion_Date
    AS PARTITION pf_Insertion_Date
    ALL TO ([PRIMARY]);
    
    -- Create the partitioned table
    CREATE TABLE DailyTransactions_Partitioned
    (
        Transaction_ID int NOT NULL,
        Insertion_Date datetime NOT NULL,
        -- Add other columns from the DailyTransactions table here
    )
    ON ps_Insertion_Date (Insertion_Date);
    
    -- Create a clustered index on the partitioning column
    CREATE CLUSTERED INDEX ci_Insertion_Date ON DailyTransactions_Partitioned (Insertion_Date);
    
    

    High-level steps on how to use the copy activity:

    Create a Linked Service for your on-premises SQL Server in ADF.

    Create a Linked Service for your Azure Blob Storage account in ADF.

    Create an Input Dataset for your SQL Server table with the appropriate schema ad partition option.

    Create an Output Dataset for your Azure Blob Storage

    Create a Pipeline with a Copy Data activity using input and output datasets.

    Here is a video tutorial explained by our community expert.

    I hope this helps. Please let us know if you have any further questions.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.