Archiving multiple tables based on 2 common columns

Question

Archiving multiple tables based on 2 common columns

Mohanraj Ramalingam 150

Hi All,

I have a requirement where 22 tables need to be archived based on two columns which are date and clientID.

All the 22 tables are having different table structure but contain date and clientID which common across all the tables.

Now, I need to archive all the 22 tables in such a way it needs to create folder strucutre as clientID, csv (folder name),

year, month for each of the 22 tables.

Can you suggest me an optimized way to create pipeline where, I don't wanto create separate pipeline for all the 22 tables.

Any recommendation or suggestion is appreciated.

Regards,

Mohanraj

Chandra Boorla 15,475 Reputation points Microsoft External Staff Moderator

2023-09-08T13:15:53.9233333+00:00

@Mohanraj Ramalingam Just checking in to see if the below answer provided by @Amira Bedhiafi helped.

If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.
Mohanraj Ramalingam 150 Reputation points

2023-09-10T05:19:20.19+00:00

@Chandra Boorla I have accepted @Amira Bedhiafi solution and Will post any query If I have in the implementation. Thank @Amira Bedhiafi for the detailed explanation.

Answer accepted by question author

0 additional answers

Your answer

Chandra Boorla 15,475 Reputation points Microsoft External Staff Moderator

2023-09-08T13:15:53.9233333+00:00

@Mohanraj Ramalingam Just checking in to see if the below answer provided by @Amira Bedhiafi helped.

If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.
Mohanraj Ramalingam 150 Reputation points

2023-09-10T05:19:20.19+00:00

@Chandra Boorla I have accepted @Amira Bedhiafi solution and Will post any query If I have in the implementation. Thank @Amira Bedhiafi for the detailed explanation.

Answer 1

Try to use pipeline parameters to generalize the archival process with the following :

- **tableName** parameter: to specify the table name.

- **clientID** parameter: to specify the clientID.

- **date** parameter: This can be further broken down to **year** and **month** if needed.

Then in your copy data activity, use the parameterized table name to read from the respective table.

And for your destination,use the input parameters to filter out the rows based on clientID and date for each table in the source dataset.

Use the parameters (clientID, year, and month) to dynamically create the folder structure in the destination, such as blob storage.

Use ADF's ForEach activity. Create a list/array of your 22 table names. Loop over each table name and invoke the copy data activity, passing the table name to the pipeline parameter.

You can set up a trigger (time-based or event-based) to initiate the archival process periodically like below :

Set up parameters:
- Go to your pipeline, add parameters tableName, clientID, year, and month.
Source Configuration:
- Create a dataset that connects to your database.
- In the dataset's table configuration, use the @pipeline().parameters.tableName to dynamically set the table.

Query Transformation:

In your copy activity, in the source tab, use a query that filters based on the clientID and date provided. Example:


    SELECT * FROM @pipeline().parameters.tableName WHERE clientID = @pipeline().parameters.clientID AND YEAR(date) = @pipeline().parameters.year AND MONTH(date) = @pipeline().parameters.month

Destination Configuration:
- Create a dataset that connects to your blob storage or any other desired location.
- For the folder structure, use the path like:
```
    @{pipeline().parameters.clientID}/csv/@{pipeline().parameters.year}/@{pipeline().parameters.month}
```

Looping:

Create an array parameter, say tableList, containing the names of your 22 tables.

- Add a `ForEach` activity. Set batch count as per your preference.

      - For items, use the `tableList`.

            - Inside the `ForEach` activity, add your copy data activity. This will run for each table in the list.

Mohanraj Ramalingam 150 Reputation points

2023-09-10T15:26:09.37+00:00

Hi @Amira Bedhiafi , As per your suggestion, I have created the parameters at the pipeline level. But, how will I pass the list of 22 tables, schemas and columns names to these parameters. Because, before copy activity, I need to check one condition with respect to date column as if the current month is closed or not. Like if the current date is more that last months last date and only for those tables I need to perform the copy activity.

Got the answer from your solution in the last step. Thanks @Amira Bedhiafi
Chandra Boorla 15,475 Reputation points Microsoft External Staff Moderator

2023-09-13T04:56:59.6466667+00:00

@Mohanraj - Glad to know it helped. Continue to use MS Q&A platform for any question related to Azure!

Share via

Archiving multiple tables based on 2 common columns

0 additional answers

Your answer