How to partition data by column in ADF using Copy Data Activity from a SQL server table?

Question

How to partition data by column in ADF using Copy Data Activity from a SQL server table?

Vansh Rathore 0

Hello,

I have a Sql server table (on-premise) named DailyTransactions, I want to partition the data based on the Insertion_Date column and create csv files for each partition and store them in a folder heirarcy like: YYYY/MM/DD/ Data-yyyy-mm-dd.csv

like for the partition of date 25-01-2023, the data to be stored as:
Output/2023/01/25/ Data-2023-01-25.csv

The sample csv file is as:
TransactionData

Elielton de Oliveira 0 Reputation points

2023-05-02T10:58:18.8866667+00:00
Um exemplo de como fazer isso usando o SSIS:

Crie um novo projeto do SSIS no Visual Studio.

Adicione um componente "OLE DB Source" e um componente "Flat File Destination" ao fluxo de dados.

Configure o componente "OLE DB Source" para selecionar os dados da tabela "DailyTransactions" e ordená-los por "Insertion_Date".

Adicione uma transformação "Derived Column" ao fluxo de dados e crie uma nova coluna que contenha a data de inserção formatada como AAAA/MM/DD.

Adicione uma transformação "Multicast" ao fluxo de dados para dividir o fluxo em várias saídas.

Para cada saída do "Multicast", adicione uma transformação "Conditional Split" e configure-a para filtrar os dados com base na data de inserção.

Para cada saída do "Conditional Split", adicione um componente "Flat File Destination" e configure-o para gravar os dados em um arquivo CSV na pasta correta com base na data de inserção.

Execute o pacote do SSIS para gerar os arquivos CSV na hierarquia de pastas desejada.

Por exemplo, para a partição de data 25-01-2023, você pode criar um componente "Flat File Destination" que grave os dados em um arquivo CSV na pasta "Output/2023/01/25/Data-2023-01-25.csv". Você pode criar essa pasta usando o componente "Script Task" e escrevendo um código C# ou VB.NET para criar a hierarquia de pastas com base na data de inserção.
Bhargava-MSFT 31,261 Reputation points Microsoft Employee Moderator

2023-05-03T22:08:32.21+00:00

Hello Vansh Rathore,

I am checking to see if you got a chance to look into my earlier response. Please let me know if you have any further questions.
Bhargava-MSFT 31,261 Reputation points Microsoft Employee Moderator

2023-05-04T23:24:51.1233333+00:00

Hello Vansh Rathore,

I am checking to see if you have any further questions here.

1 answer

Your answer

Elielton de Oliveira 0 Reputation points

2023-05-02T10:58:18.8866667+00:00

Um exemplo de como fazer isso usando o SSIS:

Crie um novo projeto do SSIS no Visual Studio.

Adicione um componente "OLE DB Source" e um componente "Flat File Destination" ao fluxo de dados.

Configure o componente "OLE DB Source" para selecionar os dados da tabela "DailyTransactions" e ordená-los por "Insertion_Date".

Adicione uma transformação "Derived Column" ao fluxo de dados e crie uma nova coluna que contenha a data de inserção formatada como AAAA/MM/DD.

Adicione uma transformação "Multicast" ao fluxo de dados para dividir o fluxo em várias saídas.

Para cada saída do "Multicast", adicione uma transformação "Conditional Split" e configure-a para filtrar os dados com base na data de inserção.

Para cada saída do "Conditional Split", adicione um componente "Flat File Destination" e configure-o para gravar os dados em um arquivo CSV na pasta correta com base na data de inserção.

Execute o pacote do SSIS para gerar os arquivos CSV na hierarquia de pastas desejada.

Por exemplo, para a partição de data 25-01-2023, você pode criar um componente "Flat File Destination" que grave os dados em um arquivo CSV na pasta "Output/2023/01/25/Data-2023-01-25.csv". Você pode criar essa pasta usando o componente "Script Task" e escrevendo um código C# ou VB.NET para criar a hierarquia de pastas com base na data de inserção.
Bhargava-MSFT 31,261 Reputation points Microsoft Employee Moderator

2023-05-03T22:08:32.21+00:00

Hello Vansh Rathore,

I am checking to see if you got a chance to look into my earlier response. Please let me know if you have any further questions.
Bhargava-MSFT 31,261 Reputation points Microsoft Employee Moderator

2023-05-04T23:24:51.1233333+00:00

Hello Vansh Rathore,

I am checking to see if you have any further questions here.

Answer 1

Hello Vansh Rathore,

To partition the data based on the Insertion_Date column and create CSV files for each partition; you can use ADF to copy and partition the data from your on-premises SQL Server table. You'll need to create a pipeline with a Copy Data activity that reads from your SQL Server table and writes to Azure Blob Storage in the desired folder hierarchy and file format.

Here in the source, you need to use the Partition option.

Use the Physical partition of table option if your table is already partitioned.

Otherwise, you can use Dynamic range.

User's image

https://learn.microsoft.com/en-us/azure/data-factory/connector-azure-sql-database?tabs=data-factory#parallel-copy-from-sql-database

You can use the below code to partition your table based on the insertion_date. For that, you need to create a partition function and partition scheme that partitions the table by Insertion_Date.

-- Create the partition function
CREATE PARTITION FUNCTION pf_Insertion_Date (datetime)
AS RANGE RIGHT FOR VALUES ('20230101', '20230102', '20230103', '20230104', '20230105', '20230106', '20230107', '20230108', '20230109', '20230110');

-- Create the partition scheme
CREATE PARTITION SCHEME ps_Insertion_Date
AS PARTITION pf_Insertion_Date
ALL TO ([PRIMARY]);

-- Create the partitioned table
CREATE TABLE DailyTransactions_Partitioned
(
    Transaction_ID int NOT NULL,
    Insertion_Date datetime NOT NULL,
    -- Add other columns from the DailyTransactions table here
)
ON ps_Insertion_Date (Insertion_Date);

-- Create a clustered index on the partitioning column
CREATE CLUSTERED INDEX ci_Insertion_Date ON DailyTransactions_Partitioned (Insertion_Date);

High-level steps on how to use the copy activity:

Create a Linked Service for your on-premises SQL Server in ADF.

Create a Linked Service for your Azure Blob Storage account in ADF.

Create an Input Dataset for your SQL Server table with the appropriate schema ad partition option.

Create an Output Dataset for your Azure Blob Storage

Create a Pipeline with a Copy Data activity using input and output datasets.

Here is a video tutorial explained by our community expert.

I hope this helps. Please let us know if you have any further questions.

Share via

How to partition data by column in ADF using Copy Data Activity from a SQL server table?

1 answer

Your answer