SQL CDC Stream to Databricks

Dondapati, Navin 281 Reputation points
2020-09-29T22:00:29.797+00:00

Hi Guys,

We have below scenario to be implemented for our Azure Setup; what options do i have in Azure

SQL DB -CDC Enabled-Stream CDC data continuously to Azure Data Bricks

Is it possible via Azure Event Hubs? Else we have to go with Kafka only? Else every minute job on spark engine itself?

Else go with any other third part tools?

Cost is big factory which plays into decision.

Regards,
Navin

Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
2,080 questions
{count} votes

2 answers

Sort by: Most helpful
  1. Dondapati, Navin 281 Reputation points
    2020-10-14T22:20:08.717+00:00

    Here is our solutions
    Single node Databricks spark cluster, install kafka on it and use debezium SQL connector to stream data, super cheap and easy solution, no need to waster money on HDinsight kafaka or Stream Sets or Event hubs complicated solutions, Kafka is open source and Debezium is open source.

    1 person found this answer helpful.

  2. PRADEEPCHEEKATLA-MSFT 85,586 Reputation points Microsoft Employee
    2020-10-12T13:14:22.157+00:00

    Hello @Anonymous ,

    Apologize for the delay in response.

    You need to use StreamSets for Databricks brings the power of two data planes in the StreamSets DataOps platform for building, testing and deploying ingest to transform and ML jobs with Databricks.

    OR

    StreamSets Data Collector is an easy-to-use data pipeline engine for streaming, CDC and batch ingest from any source to Azure. With StreamSets, you spend your time building data pipelines, enabling self-service, and innovating, and minimize the time you spend maintaining, rewriting and fixing pipelines.
    Ingest data from a broad variety of sources including Kafka, HDFS, databases, files, applications and more into Azure Storage, Azure Event Hub, Azure Synapse, Snowflake and Databricks. Integrated with Azure Key Vault for seamless security.

    Similar solution is provided here: Real-time Change Data Capture: Structured Streaming with Azure Databricks

    Hope this helps. Do let us know if you any further queries.

    ----------------------------------------------------------------------------------------

    Do click on "Accept Answer" and Upvote on the post that helps you, this can be beneficial to other community members.