Thank you for outlining your architecture and the two options under evaluation. Given your scale and the criticality of maintaining reliable CDC ingestion into Azure SQL Hyperscale, here’s a breakdown of Option 2 and best practices based on Microsoft’s guidance and large-scale implementations.
Is Option 2 (Direct Merge into Hyperscale) Technically Safe and Enterprise-Grade?
Yes, Option 2 is technically feasible and can be made enterprise-grade if implemented with proper orchestration. Azure SQL Hyperscale is architected for high concurrency and throughput, but some important considerations apply when performing frequent MERGE
or UPSERT
operations at scale.
Key considerations for Hyperscale with Direct CDC Merge
Merge Performance & Concurrency
- Hyperscale supports concurrent reads/writes well due to its architecture (separated compute & storage).
- However, large or frequent
MERGE
statements can still contend on hot pages, indexes, and transaction logs. - For tens of millions of rows per day, it’s recommended to:
- Break merges into micro-batches (e.g., 10k–100k rows per batch).
- Use partition-based logic (e.g., by date, Kafka offset, shard ID).
Transaction Scope
- Long-running transactions may lead to versioning pressure or log retention issues, even in Hyperscale.
- Prefer smaller transactions, ideally idempotent operations (to allow retries).
Recommended best practices
Merge Keys - Use Primary Keys where available. If using a natural key, ensure it’s consistent and immutable across systems (to avoid mismatches).
Error Handling & Retry Logic - Implement upsert logic with retry policies in Databricks (try-catch blocks + retries for transient JDBC failures). Consider dead-lettering irrecoverable errors for offline review.
Reconciliation & Deduplication - Maintain an event_id
or change_sequence_number
column in CDC_Flattened
. Use this to ensure idempotent writes and support exactly once processing.
Merge Batching Strategy - Batch by --> Kafka offset ranges, ingestion timestamp, or source table partition. Each batch should complete in <5 minutes to avoid long transactions.
Hyperscale-Specific considerations
Area | Consideration |
---|---|
Concurrency | Hyperscale handles concurrency well, but watch for contention on PKs or clustered indexes during merges. |
Transaction Log (LSN) | Large merges may retain log records longer than expected, affecting tempdb or causing log growth. |
Write Throughput | JDBC write speed depends on your Databricks driver config. Consider using parallel JDBC writes if needed. |
Index Maintenance | Ensure indexes are optimized post-merge. Consider periodic rebuild/reorg if merge volume is high. |
Conclusion:
Option 2 is a valid and efficient approach, especially when minimizing latency is a priority. To ensure reliability and scalability:
- Batch and partition your merge logic,
- Design for idempotency and retries,
- Monitor log and index health,
- And apply strong governance over key definitions.
If your use case is mission-critical, also consider maintaining an audit or reconciliation table to validate data integrity post-merge.
I hope this information helps. Please do let us know if you have any further queries.
Kindly consider upvoting the comment if the information provided is helpful. This can assist other community members in resolving similar issues.
Thank you.