What to Use for Kafka-Based Structured Streaming: Databricks or Delta Live Tables?

Question

What to Use for Kafka-Based Structured Streaming: Databricks or Delta Live Tables?

Janice Chi 140

We are implementing a real-time streaming architecture where change data from on-prem DB2 flows through Kafka (already hosted on GCP) into Azure for downstream processing and storage.

Our requirements:

Process incoming Kafka messages in structured streaming mode

Apply basic transformations (column selection, casting, filtering)

Perform deduplication based on natural keys

Write the cleaned data to Azure SQL Hyperscale (via JDBC)

Maintain checkpointing and fault-tolerant streaming logic

Process approximately 800 topics (each representing one DB2 table)

We are evaluating whether to use regular Databricks structured streaming (notebooks/jobs) or Delta Live Tables (DLT) for this scenario.

Our main concerns:

Cost vs. value when scaling to 800 topics

Flexibility for debugging and custom logic (like recon, retries)

Operational manageability in production

Whether DLT truly simplifies or adds complexity at this scale

Question: Given the above, is Delta Live Tables the recommended solution, or should we use standard Databricks structured streaming with jobs/notebooks? Are there specific cases where DLT is not advisable for large-scale Kafka topic ingestion?

Appreciate any guidance based on best practices for large-scale ingestion and real-time processing in Azure.

Chandra Boorla 14,585 Reputation points Microsoft External Staff Moderator

2025-06-05T16:31:25.7633333+00:00

@Janice Chi

We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet. In case if you have any resolution, please do share that same with the community as it can be helpful to others. Otherwise, will respond with more details and we will try to help.
Chandra Boorla 14,585 Reputation points Microsoft External Staff Moderator

2025-06-10T17:04:42.4133333+00:00

@Janice Chi

Just checking in to see, did the breakdown above help clarify the decision between using Databricks Structured Streaming vs. Delta Live Tables (DLT) for your Kafka-based ingestion, especially considering your scale (800 topics) and streaming requirements (3k–25k events/sec)? As your feedback is valuable and can assist others in the community facing similar issues.

If you have found a resolution to your issue, we would appreciate it if you could share it in the thread to benefit others.

1 answer

Your answer

Chandra Boorla 14,585 Reputation points Microsoft External Staff Moderator

2025-06-05T16:31:25.7633333+00:00

@Janice Chi

We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet. In case if you have any resolution, please do share that same with the community as it can be helpful to others. Otherwise, will respond with more details and we will try to help.
Chandra Boorla 14,585 Reputation points Microsoft External Staff Moderator

2025-06-10T17:04:42.4133333+00:00

@Janice Chi

Just checking in to see, did the breakdown above help clarify the decision between using Databricks Structured Streaming vs. Delta Live Tables (DLT) for your Kafka-based ingestion, especially considering your scale (800 topics) and streaming requirements (3k–25k events/sec)? As your feedback is valuable and can assist others in the community facing similar issues.

If you have found a resolution to your issue, we would appreciate it if you could share it in the thread to benefit others.

Answer 1

Chandra Boorla 14,585 Microsoft External Staff Moderator

@Janice Chi

Thanks for the detailed question, your use case is a classic example of large-scale real-time ingestion where choosing the right tool can significantly impact cost, maintainability, and operational efficiency.

Given your requirements, especially the need to process ~800 Kafka topics, apply light transformations, perform deduplication, and maintain production-grade reliability. Here’s a comparison of Delta Live Tables (DLT) vs. standard Databricks Structured Streaming:

Structured Streaming (Notebooks/Jobs)

Category	Pros	Cons
Flexibility	Full control for custom logic like reconciliation, conditional retries, topic-specific handling	Requires manual implementation for advanced features
Scalability	Scales better with high topic counts; supports dynamic job generation or orchestration	Needs orchestration setup for managing multiple jobs
Cost Efficiency	Lower cost footprint — pay only for compute, no added DLT cost	No built-in optimization or resource management features like in DLT
Observability & Lineage	Can integrate with Unity Catalog to enable lineage (manually)	Lacks native built-in lineage, monitoring, and logging — must be custom implemented
Operational Overhead	More customizable for fine-grained operations	Higher DevOps overhead: orchestration, monitoring, error handling need to be developed externally

Delta Live Tables (DLT)

Category	Pros	Cons
Ease of Use	Declarative pipeline syntax simplifies development and onboarding	Limited flexibility for complex, topic-specific logic or custom error handling
Built-in Features	Automatically manages checkpointing, retries, pipeline orchestration, and schema enforcement	May introduce abstraction overhead and reduce fine-grained control
Observability & Lineage	Native integration with Unity Catalog provides built-in lineage, monitoring, and data quality tracking	Less transparency for debugging deeply nested or dynamic processing logic
Operational Simplicity	Reduces DevOps burden with built-in error handling, recovery, and scheduling	Harder to dynamically scale or templatize for very large numbers of Kafka topics (like 800)
Cost Considerations	Predictable managed-service billing; operational value for small-to-moderate pipelines	Higher total cost at scale due to additional charges beyond compute (especially with many pipelines)

Recommendation

Given your need to handle 800+ Kafka topics — each potentially with unique logic or schema and your requirements for fault tolerance, deduplication, and operational control:

Databricks Structured Streaming (with notebooks or jobs) is the better fit at this scale.

It offers the flexibility, cost-efficiency, and scalability needed for high-volume streaming workloads. You can template and parameterize your logic and use orchestrators like Databricks Workflows or Azure Data Factory to manage these jobs effectively.

Delta Live Tables can be a good fit if:

You consolidate or group topics,
Pipelines share similar schemas or logic,
You want a simplified operational model for a smaller set of high-priority data flows.

For additional information, please refer to the below Microsoft documentations:

I hope this information helps. Please do let us know if you have any further queries.

Kindly consider upvoting the comment if the information provided is helpful. This can assist other community members in resolving similar issues.

Thank you.

Janice Chi 140 Reputation points

2025-06-06T13:42:33.85+00:00

so whether we should use DBR or DLT , does this depends on number of topics/tables =800 or events/sec 3,000 and 25,000 or both and with these numbers what is recommendation to use for CDC and Streaming (NRT)?
Smaran Thoomu 24,110 Reputation points Microsoft External Staff Moderator

2025-06-09T05:21:09.3866667+00:00
Hi @Janice Chi
To clarify - the earlier response already considered both the number of Kafka topics (800) and the event volume (3,000 to 25,000/sec) when recommending the best-fit option for your scenario.

Recommendation remains: Use Databricks Structured Streaming (DBR jobs/notebooks)

This is due to:

High topic/table count (800+): DLT doesn’t scale well with dynamic or high-count ingestion pipelines. DBR offers better flexibility and templating options.

High event volume (3k–25k/sec): DBR provides more control over optimization, partitioning, backpressure, and tuning - all critical at this volume.

CDC & NRT suitability: DBR is built for both catch-up and streaming logic using the same codebase. You can parameterize Trigger.Once() for CDC and Trigger.ProcessingTime("5 seconds") for near-real-time (NRT) with low overhead.

DLT not advisable in your case unless:

You consolidate or group the 800 topics,

Logic per topic is mostly uniform,

You’re okay with higher cost and abstraction overhead.

So yes, the decision depends on both the number of topics and event volume - and based on the numbers you’ve shared, DLT is not the right tool here.

Refer back to our earlier post for a breakdown of where DLT fits better and where it doesn’t -your scenario was already clearly addressed there.

I hope this information helps. Please do let us know if you have any further queries.

Kindly consider upvoting the comment if the information provided is helpful. This can assist other community members in resolving similar issues.
Janice Chi 140 Reputation points

2025-06-09T07:17:45.17+00:00
@Smaran Thoomu in another post u said totally opposite can u please tell reason . in below post u r recommending DBR for Catch up and DLT for streaming for same 800 topics and High event volume (3k–25k/sec) why while here u r saying DLT is not advisable can u please clear this -https://learn.microsoft.com/en-us/answers/questions/2279059/clarification-on-autoscaling-limitations-for-struc

Got your point, let me break down where Databricks Runtime (DBR) vs. Delta Live Tables (DLT) might make the most sense in your setup - and why:

Catch-up (Batch CDC) Mode: use DBR

Since you're using fixed offset ranges, this is more of a controlled, batch-style operation, even if driven by CDC.

DLT is optimized for declarative streaming pipelines, but for batch ingestion (like catch-up loads), a traditional DBR job gives you more control, flexibility, and even simpler cost management.

You can use job clusters and schedule them via ADF without needing the always-on behavior of streaming pipelines.

Why DBR here: More flexibility, easier integration with SQL Hyperscale, and avoids the always-on cost model.

Real-Time Streaming (Watermark Logic): Consider DLT

Here's where DLT can really shine, especially if you're expecting fluctuating volumes (as you mentioned earlier — between 3,000 and 25,000 events/sec).

DLT offers:

Enhanced autoscaling, especially on Photon clusters, which means better cost efficiency during low-traffic periods.

Built-in checkpointing, retries, monitoring, and recovery, which reduces the operational burden significantly.

Simpler declarative setup using SQL or Python for streaming logic - helpful if your team wants less hands-on cluster management.

Even though your sink is Azure SQL Hyperscale (not Delta), you can still use DLT -you would just need a write-to-external-sink logic inside your pipeline.

Why DLT here: Scales better with load, reduces idle cost, and minimizes manual monitoring/setup effort.Got your point, let me break down where Databricks Runtime (DBR) vs. Delta Live Tables (DLT) might make the most sense in your setup - and why:

Catch-up (Batch CDC) Mode: use DBR

Since you're using fixed offset ranges, this is more of a controlled, batch-style operation, even if driven by CDC.

DLT is optimized for declarative streaming pipelines, but for batch ingestion (like catch-up loads), a traditional DBR job gives you more control, flexibility, and even simpler cost management.

You can use job clusters and schedule them via ADF without needing the always-on behavior of streaming pipelines.

.
Janice Chi 140 Reputation points

2025-06-09T14:23:03.1333333+00:00

waiting for reply please

Chandra Boorla 14,585 Microsoft External Staff Moderator

@Janice Chi

That’s a great observation and I appreciate your careful review of the guidance.

Let me clarify the difference between the two recommendations, as they actually reflect different aspects of your architecture and workload patterns:

Two Ingestion Modes – Two Tools

You're dealing with two streaming patterns in your architecture:

Mode	Recommended Tool	Reason
Catch-up / Batch-style CDC	Databricks Structured Streaming (DBR)	For replaying past Kafka offsets or loading historic CDC data, DBR jobs (e.g., notebooks or job clusters) offer more flexibility, control, and are cost-efficient since they don’t require always-on clusters.
Continuous Near Real-Time (NRT) Streaming	Delta Live Tables (DLT) (conditional)	For always-on, high-throughput streaming with automatic checkpointing, autoscaling (especially with Photon), and built-in observability, DLT can reduce operational overhead — if transformations and logic are relatively standard.

So Why Recommend DBR in One Case and DLT in Another?

The recommendation depends on what's more important in your scenario:

In your current post, you're asking about scaling to 800 Kafka topics, many of which likely have custom logic, schema differences, recon/dedup needs, and JDBC sinks (Azure SQL Hyperscale). This is where Structured Streaming (DBR) is recommended, because:

You get code-level control per topic.
Easier to dynamically parameterize across many topics.
Lower cost at scale compared to managing 800+ DLT flows.

In the previous Microsoft Q&A thread you referenced, the focus was on scaling streaming performance and reducing always-on cost through autoscaling and fault tolerance, where DLT can be a better fit for a smaller number of standardized pipelines.

Conclusion:

Use DBR Structured Streaming for large-scale ingestion (800+ topics), especially if you need per-topic control, advanced logic, and flexibility with JDBC sinks.

Consider DLT for real-time pipelines where workloads are more standardized, and operational simplicity and autoscaling are higher priorities (e.g., top ~50 high-value tables).

I hope this information helps. Please do let us know if you have any further queries.

Kindly consider upvoting the comment if the information provided is helpful. This can assist other community members in resolving similar issues.

Thank you.

Share via

What to Use for Kafka-Based Structured Streaming: Databricks or Delta Live Tables?

1 answer

Your answer