How to create a ADF pipeline with dynamic dataset?

Aravind 20 Reputation points
2024-01-29T07:12:52.8933333+00:00

Hi Team, I will explain briefly here, see I will create only one dataset for Postgresql default_db database so this database has a table called details. so this table contains all database names exactly. In ADF, source I will add that dataset and give a query like SELECT * FROM details; so the database_name column contains all database names for this please refer to the snapshot. User's image

so ADF pipeline fetches the database name from that table and creates a dataset itself because as of now just 5 databases only available but in the future 432 databases will be created so can't able to create a dataset and add all 432 as sources in adf. after that I will add a few queries like select * from table_name; so the ADF pipeline needs to execute this in every database table and fetch the data and store that in the sink table. my query is how to build a pipeline like what are the database names available from the default_db, details table adf should create a pipeline dynamically and execute the queries from all db so for this what features need to be used in adf? I excepting detailed explanation Thanks & Regards Aravind

Azure SQL Database
Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
Azure App Service
Azure App Service
Azure App Service is a service used to create and deploy scalable, mission-critical web apps.
{count} votes

Answer accepted by question author
  1. AnnuKumari-MSFT 34,566 Reputation points Microsoft Employee Moderator
    2024-01-29T10:53:50.13+00:00

    Hi Aravind , Thankyou for using Microsoft Q&A platform and thanks for posting your query here.

    As per my understanding , you are trying to build a solution in ADF to fetch the DBNames from the SQL Server dynamically and query the data of tables from all the available DBs in the server and copy it to destination table. Please let me know if that is not what you are looking for.

    First of all , you need to keep the lookup/control/metadata table called details table updated to hold all the available DBNames. For that , you can use the inbuilt sys object to find out all the available DBs : SELECT name FROM sys.databases

    You can write an Stored procedure to find out all the DB names coming from sys.databases table left joining with details table and finding all non-matching records(present in sys.databases but not in details table) and insert the same in details table.

    Now , You would also need all the table names which needs to be copied from all the databases. To store the table names

    Since there would be multiple tables that needs to be copied from all the databases, following query can be used for that: SELECT TABLE_NAME FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_TYPE = 'BASE TABLE'

    • Store the table name too in the control table to loop through. Cross apply both the results and store in the final control table. Suppose you have 500 DB having 10 tables each then now the control table would have 5000 rows .

    Now, You can use lookup activity to get the db names and table names from the details table. Next, use ForEach activity to loop through each of the records. Create parameterized linked service where Dbname would be fetched dynamically and create parameterized dataset where tablename would be fetched dynamically . Inside the foreach, use copy activity and in source formulate the query as select * from @{item().dbname}.@{item().tablename} .

    Make sure to use pre-copy script in sink settings to truncate the table at target before the load . Hope it helps. Please accept the answer by clicking on Accept answer button. Thankyou

    1 person found this answer helpful.

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.