partition by condition

Question

partition by condition

Shambhu Rai 1,411

Hi Expert,

i am having 30 millions records of last 20 years on day wise and wants to fetch any 4 months records in fraction of seconds .. is there any way to get it

Shambhu Rai 1,411 Reputation points

2023-07-26T17:57:53.9466667+00:00

suggestion please
Shambhu Rai 1,411 Reputation points

2023-07-27T09:51:04.5566667+00:00

but sir, partition is for 12 months in a year or different date in a month then how will we partition for last 22 years in a range
QuantumCache 20,366 Reputation points Moderator

2023-07-27T16:40:02.6866667+00:00

Hello @Shambhu Rai Do you have a Date column in your Database which can be used as partition?

What kind of the Source is that? SQL Server?
Shambhu Rai 1,411 Reputation points

2023-07-27T20:22:42.9633333+00:00

Sir for each date partition it will create more than 200 partition.. Will it improve performance? Are you talking about date range
QuantumCache 20,366 Reputation points Moderator

2023-07-27T20:24:39.9433333+00:00

Is it a SQL Server as Source Database?

I have something like this (20 Millon records) on my SQL Server, if needed will test it out..!
Shambhu Rai 1,411 Reputation points

2023-07-28T18:20:56.4833333+00:00

yes it is using sql server with hive metasore. How will we make sure any three months records will receive in one seconds once requested
Shambhu Rai 1,411 Reputation points

2023-07-30T21:12:17.4366667+00:00

suggestion please
Shambhu Rai 1,411 Reputation points

2023-07-31T08:33:56.47+00:00

suggestion please
QuantumCache 20,366 Reputation points Moderator

2023-07-31T16:26:43.4166667+00:00

Hello @Shambhu Rai

You can partition your data based on date range, such as by year or by month. Since your data spans over 20 years, you can partition it by year. This will create 20 partitions, one for each year. Within each partition, you can further partition the data by month. This will create 240 partitions, one for each month. Partitioning your data will help you to query only the relevant partitions, which will improve query performance.

Please do not forget about Query Optimization: You can optimize your queries by using appropriate query hints, such as NOLOCK, to avoid locking and blocking issues. You can also use query execution plans to identify the bottlenecks in your queries and optimize them accordingly.

Regarding your question about how to ensure that any three months records will be received in one second once requested,

you can consider using a caching mechanism to cache the frequently accessed data in memory. This will help to reduce the query execution time and improve performance. You can also consider using a load balancer to distribute the query load across multiple servers to improve performance.

I hope this helps! Let me know if you have any further questions.

You can send me an email and we can sure work in offline for further discussion.
QuantumCache 20,366 Reputation points Moderator

2023-08-01T04:38:59.3733333+00:00

Hello @Shambhu Rai You can send me an email and we can sure work in offline for further discussion.

Please send an email to with the below details, so that we can work closely on this matter.

Your Email Subject : Attn Satish Boddu

Your Email Body content must include the below:

Thread URL: Link to this thread.

Your Azure Subscription ID: <This is must to be provided to us>
Shambhu Rai 1,411 Reputation points

2023-08-01T04:43:38.94+00:00

What's your email id
QuantumCache 20,366 Reputation points Moderator

2023-08-01T04:50:45.1866667+00:00

Hello @Shambhu Rai
Please send an email to with the below details, so that we can work closely on this matter.
Your Email Subject : Attn Satish Boddu

Your Email Body content must include the below:

Thread URL: Link to this thread.

Your Azure Subscription ID: <This is must to be provided to us>

1 answer

Your answer

Shambhu Rai 1,411 Reputation points

2023-07-26T17:57:53.9466667+00:00

suggestion please
Shambhu Rai 1,411 Reputation points

2023-07-27T09:51:04.5566667+00:00

but sir, partition is for 12 months in a year or different date in a month then how will we partition for last 22 years in a range
QuantumCache 20,366 Reputation points Moderator

2023-07-27T16:40:02.6866667+00:00

Hello @Shambhu Rai Do you have a Date column in your Database which can be used as partition?

What kind of the Source is that? SQL Server?
Shambhu Rai 1,411 Reputation points

2023-07-27T20:22:42.9633333+00:00

Sir for each date partition it will create more than 200 partition.. Will it improve performance? Are you talking about date range
QuantumCache 20,366 Reputation points Moderator

2023-07-27T20:24:39.9433333+00:00

Is it a SQL Server as Source Database?

I have something like this (20 Millon records) on my SQL Server, if needed will test it out..!
Shambhu Rai 1,411 Reputation points

2023-07-28T18:20:56.4833333+00:00

yes it is using sql server with hive metasore. How will we make sure any three months records will receive in one seconds once requested
Shambhu Rai 1,411 Reputation points

2023-07-30T21:12:17.4366667+00:00

suggestion please
Shambhu Rai 1,411 Reputation points

2023-07-31T08:33:56.47+00:00

suggestion please
QuantumCache 20,366 Reputation points Moderator

2023-07-31T16:26:43.4166667+00:00

Hello @Shambhu Rai

You can partition your data based on date range, such as by year or by month. Since your data spans over 20 years, you can partition it by year. This will create 20 partitions, one for each year. Within each partition, you can further partition the data by month. This will create 240 partitions, one for each month. Partitioning your data will help you to query only the relevant partitions, which will improve query performance.

Please do not forget about Query Optimization: You can optimize your queries by using appropriate query hints, such as NOLOCK, to avoid locking and blocking issues. You can also use query execution plans to identify the bottlenecks in your queries and optimize them accordingly.

Regarding your question about how to ensure that any three months records will be received in one second once requested,

you can consider using a caching mechanism to cache the frequently accessed data in memory. This will help to reduce the query execution time and improve performance. You can also consider using a load balancer to distribute the query load across multiple servers to improve performance.

I hope this helps! Let me know if you have any further questions.

You can send me an email and we can sure work in offline for further discussion.
QuantumCache 20,366 Reputation points Moderator

2023-08-01T04:38:59.3733333+00:00

Hello @Shambhu Rai You can send me an email and we can sure work in offline for further discussion.

Please send an email to with the below details, so that we can work closely on this matter.

Your Email Subject : Attn Satish Boddu

Your Email Body content must include the below:

Thread URL: Link to this thread.

Your Azure Subscription ID: <This is must to be provided to us>
Shambhu Rai 1,411 Reputation points

2023-08-01T04:43:38.94+00:00

What's your email id
QuantumCache 20,366 Reputation points Moderator

2023-08-01T04:50:45.1866667+00:00

Hello @Shambhu Rai
Please send an email to with the below details, so that we can work closely on this matter.
Your Email Subject : Attn Satish Boddu

Your Email Body content must include the below:

Thread URL: Link to this thread.

Your Azure Subscription ID: <This is must to be provided to us>

Answer 1

Hello @Shambhu Rai Thanks for posting this query on this forum.

There are many factors in this scenario which guides the implementation, which also requires iterations!

Sharing few reading resources which might be helpful with initial query!

Dated Oct 14 2020: Performance Tuning ADF Data Flow Sources and Sinks :By Mark Kromer

Performance tuning steps

Use partitioning: You can partition your data based on the date column so that each partition contains data for a specific date range. This will allow you to query only the partitions that contain the data you need, which will significantly reduce the amount of data that needs to be scanned. ADF supports partitioning in various data sources, including Azure Blob Storage, Azure Data Lake Storage, and Azure SQL Database.

For source partitioning, the I/O of the SQL Server is the bottleneck. Adding too many partitions may saturate your source database. Generally four or five partitions is ideal when using this option.

Source partitioning

Also check: Isolation level Assuming the source is Azure SQL

Use indexing: You can create indexes on the date column to speed up the query performance. Indexing allows the database engine to quickly locate the data that matches the query criteria, which can significantly reduce the query execution time. ADF supports indexing in various data sources, including Azure SQL Database and Azure Cosmos DB.

Use caching: You can cache the frequently accessed data in memory or disk to reduce the query execution time. Caching allows the data to be retrieved from the cache instead of the data source, which can significantly reduce the query latency. ADF supports caching in various data sources, including Azure Redis Cache and Azure SQL Database.

Use parallelism: You can split the query into multiple smaller queries and execute them in parallel to speed up the query performance. Parallelism allows the queries to be executed concurrently, which can significantly reduce the query execution time. ADF supports parallelism in various data sources, including Azure SQL Database and Azure Data Lake Storage.

Mapping data flows performance and tuning guide

-->Scale up the self-hosted IR: Increase the number of concurrent jobs that can run on a node

-->Scale out the self-hosted IR: Add more nodes (machines)

Share via

partition by condition

1 answer

Your answer