Databricks → Azure SQL Hyperscale streaming MERGE scale-out pattern

Question

Databricks → Azure SQL Hyperscale streaming MERGE scale-out pattern

Anonymous

We are implementing a Kafka → Databricks → Azure SQL Hyperscale streaming ingestion pipeline and would like your guidance.

Our scenario

Scale: 200 Kafka topics → 200 Databricks Structured Streaming jobs (one per topic).

Volume: ~200,000 events per minute across all topics (e.g., Topic 1 = 10,000 events/min, Topic 2 = 3,000 events/min, etc.).

Workflow per topic:

Read from Kafka topic → flatten and write structured format to ADLS.

  Read structured data from ADLS → perform an upsert/MERGE directly into its corresponding Azure SQL Hyperscale table.
  
  **Micro-batch interval:** 1 minute.
  
  **Pattern:** Direct MERGE into master tables (no persistent staging).
  
  **Connections:** This results in ~200 concurrent JDBC writers (one per streaming workflow).

Observed performance

In testing, with one topic sending 200K events per minute, the Databricks MERGE into Hyperscale takes ~10 minutes.

Questions for Microsoft

When scaling to 200 topics with fewer events per topic (e.g., ~20,000 events/min/topic), will the MERGE latency per table likely decrease (due to smaller per-table volume), or remain similar because of the overhead of ~200 concurrent JDBC connections?

What is the expected performance and stability impact of maintaining ~200 concurrent JDBC writers into Hyperscale for 1-minute micro-batches?

Are there any recommended limits, best practices, or official guidance for handling this scale of

0 comments

1 answer

Your answer

Answer 1

Venkat Reddy Navari 5,840 Microsoft External Staff Moderator

Hi Janice Chi Based on your setup, the MERGE performance is more tied to the volume per table than the number of topics. So when you split 200K events/min across multiple topics and each table is only handling ~20K rows/min, the MERGE itself should complete faster. The catch is that running 200 separate MERGEs every minute adds a lot of concurrent transactional work.
So you’ll start seeing pressure on the transaction log and CPU rather than just row counts.

Hyperscale can deal with high storage and throughput, but there are limits on log rate and worker threads. Having 200 JDBC writers pushing every minute is on the high side, and in practice most designs consolidate data into staging/landing tables first and then MERGE in larger, controlled batches. That pattern tends to be more stable and avoids log bottlenecks.

Use a staging + MERGE pattern instead of direct per-topic MERGEs (recommended in Microsoft guidance).
Tune your JDBC writes from Databricks with options like batchsize so data is sent in chunks, not row-by-row.
Keep an eye on Hyperscale log rate and CPU usage. Useful reference here: Hyperscale performance diagnostics.
If needed, scale up compute or log IO capacity to absorb the workload.

Finally: Smaller per-table volumes will help, but the real risk is from the number of concurrent writers. If you do run into stability issues, moving to a staging approach and reducing the number of MERGE operations per minute is usually the way to go

I hope this information helps. Please do let us know if you have any further queries.
Kindly consider upvoting the comment if the information provided is helpful. This can assist other community members in resolving similar issues.

Anonymous

2025-08-29T11:52:30.97+00:00
please let me know URL where it is recommended to use STAGING

Use a staging + MERGE pattern instead of direct per-topic MERGEs (recommended in Microsoft guidance).
Smaran Thoomu 35,375 Reputation points Microsoft External Staff Moderator

2025-09-01T19:13:00.21+00:00

@Anonymous Microsoft guidance (see Q&A threads “merge into Azure SQL Hyperscale main tables” and “Tables Merge and Recon during Streaming”) recommends writing into a staging table first and then performing set-based MERGEs into the main table. This helps reduce log contention, improve retries, and maintain consistency compared to direct per-topic merges.
Venkat Reddy Navari 5,840 Reputation points Microsoft External Staff Moderator

2025-09-03T18:11:28.0366667+00:00

Janice Chi Following up to see if the above answer was helpful. If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

Share via

Databricks → Azure SQL Hyperscale streaming MERGE scale-out pattern

1 answer

Your answer