Hello @Shambhu Rai Thanks for posting this query on this forum.
There are many factors in this scenario which guides the implementation, which also requires iterations!
Sharing few reading resources which might be helpful with initial query!
Dated Oct 14 2020: Performance Tuning ADF Data Flow Sources and Sinks :By Mark Kromer
Use partitioning: You can partition your data based on the date column so that each partition contains data for a specific date range. This will allow you to query only the partitions that contain the data you need, which will significantly reduce the amount of data that needs to be scanned. ADF supports partitioning in various data sources, including Azure Blob Storage, Azure Data Lake Storage, and Azure SQL Database.
For source partitioning, the I/O of the SQL Server is the bottleneck. Adding too many partitions may saturate your source database. Generally four or five partitions is ideal when using this option.
Also check: Isolation level Assuming the source is Azure SQL
Use indexing: You can create indexes on the date column to speed up the query performance. Indexing allows the database engine to quickly locate the data that matches the query criteria, which can significantly reduce the query execution time. ADF supports indexing in various data sources, including Azure SQL Database and Azure Cosmos DB.
Use caching: You can cache the frequently accessed data in memory or disk to reduce the query execution time. Caching allows the data to be retrieved from the cache instead of the data source, which can significantly reduce the query latency. ADF supports caching in various data sources, including Azure Redis Cache and Azure SQL Database.
Use parallelism: You can split the query into multiple smaller queries and execute them in parallel to speed up the query performance. Parallelism allows the queries to be executed concurrently, which can significantly reduce the query execution time. ADF supports parallelism in various data sources, including Azure SQL Database and Azure Data Lake Storage.
Mapping data flows performance and tuning guide
-->Scale up the self-hosted IR: Increase the number of concurrent jobs that can run on a node
-->Scale out the self-hosted IR: Add more nodes (machines)