Share via

Archiving multiple tables based on 2 common columns

Mohanraj Ramalingam 150 Reputation points
2023-09-06T16:59:38.4133333+00:00

Hi All,

I have a requirement where 22 tables need to be archived based on two columns which are date and clientID.

All the 22 tables are having different table structure but contain date and clientID which common across all the tables.

Now, I need to archive all the 22 tables in such a way it needs to create folder strucutre as clientID, csv (folder name),

year, month for each of the 22 tables.

Can you suggest me an optimized way to create pipeline where, I don't wanto create separate pipeline for all the 22 tables.

Any recommendation or suggestion is appreciated.

Regards,

Mohanraj

Azure Data Factory
Azure Data Factory

An Azure service for ingesting, preparing, and transforming data at scale.


Answer accepted by question author

Amira Bedhiafi 42,941 Reputation points MVP Volunteer Moderator
2023-09-07T12:18:06.74+00:00

Try to use pipeline parameters to generalize the archival process with the following :

- **tableName** parameter: to specify the table name.

- **clientID** parameter: to specify the clientID.

- **date** parameter: This can be further broken down to **year** and **month** if needed.

Then in your copy data activity, use the parameterized table name to read from the respective table.

And for your destination,use the input parameters to filter out the rows based on clientID and date for each table in the source dataset.

Use the parameters (clientID, year, and month) to dynamically create the folder structure in the destination, such as blob storage.

Use ADF's ForEach activity. Create a list/array of your 22 table names. Loop over each table name and invoke the copy data activity, passing the table name to the pipeline parameter.

You can set up a trigger (time-based or event-based) to initiate the archival process periodically like below :

  1. Set up parameters:
    • Go to your pipeline, add parameters tableName, clientID, year, and month.
  2. Source Configuration:
    • Create a dataset that connects to your database.
    • In the dataset's table configuration, use the @pipeline().parameters.tableName to dynamically set the table.
  3. Query Transformation:
    • In your copy activity, in the source tab, use a query that filters based on the clientID and date provided. Example:
    
        SELECT * FROM @pipeline().parameters.tableName WHERE clientID = @pipeline().parameters.clientID AND YEAR(date) = @pipeline().parameters.year AND MONTH(date) = @pipeline().parameters.month
    
    
  4. Destination Configuration:
    • Create a dataset that connects to your blob storage or any other desired location.
    • For the folder structure, use the path like:
    
        @{pipeline().parameters.clientID}/csv/@{pipeline().parameters.year}/@{pipeline().parameters.month}
    
    
  5. Looping:
    • Create an array parameter, say tableList, containing the names of your 22 tables.
      - Add a `ForEach` activity. Set batch count as per your preference.
      
            - For items, use the `tableList`.
      
                  - Inside the `ForEach` activity, add your copy data activity. This will run for each table in the list.
      

Was this answer helpful?

1 person found this answer helpful.

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.