Share via

Kafka Connector for Databricks

pseudo p 0 Reputation points
2024-07-22T14:43:43.1133333+00:00

Can you forward me documentation to use Kafka Connector for Apache Spark Streaming job to connect to Azure Eventhub.

I am looking for maven library version etc.

Azure Databricks
Azure Databricks

An Apache Spark-based analytics platform optimized for Azure.


1 answer

Sort by: Most helpful
  1. Amira Bedhiafi 41,131 Reputation points Volunteer Moderator
    2024-07-22T21:35:24.21+00:00

    To integrate Azure Event Hubs with Apache Spark, you will need the following Maven coordinates:

    • Group ID: com.microsoft.azure
    • Artifact ID: azure-eventhubs-spark_2.12
    • Version: 2.3.22

    This dependency can be included in your project's pom.xml if you are using Maven:

    
    <dependency>
    
      <groupId>com.microsoft.azure</groupId>
    
      <artifactId>azure-eventhubs-spark_2.12</artifactId>
    
      <version>2.3.22</version>
    
    </dependency>
    
    

    Before you can connect, ensure you have an Event Hubs namespace created in Azure. You will need the Event Hubs connection string and the fully qualified domain name (FQDN).

    You can configure your Spark job to read from and write to Azure Event Hubs using the Kafka API.

    
    val df = spark.readStream
    
      .format("kafka")
    
      .option("subscribe", "YOUR_TOPIC_NAME")
    
      .option("kafka.bootstrap.servers", "YOUR_EVENTHUBS_NAMESPACE.servicebus.windows.net:9093")
    
      .option("kafka.sasl.mechanism", "PLAIN")
    
      .option("kafka.security.protocol", "SASL_SSL")
    
      .option("kafka.sasl.jaas.config", "org.apache.kafka.common.security.plain.PlainLoginModule required username=\"$ConnectionString\" password=\"YOUR_EVENTHUBS_CONNECTION_STRING\";")
    
      .option("kafka.request.timeout.ms", "60000")
    
      .option("kafka.session.timeout.ms", "30000")
    
      .option("kafka.group.id", "YOUR_GROUP_ID")
    
      .option("failOnDataLoss", "true")
    
      .load()
    
    df.writeStream
    
      .outputMode("append")
    
      .format("console")
    
      .start()
    
    

    For writing to Event Hubs:

    
    df.writeStream
    
      .format("kafka")
    
      .option("topic", "YOUR_TOPIC_NAME")
    
      .option("kafka.bootstrap.servers", "YOUR_EVENTHUBS_NAMESPACE.servicebus.windows.net:9093")
    
      .option("kafka.sasl.mechanism", "PLAIN")
    
      .option("kafka.security.protocol", "SASL_SSL")
    
      .option("kafka.sasl.jaas.config", "org.apache.kafka.common.security.plain.PlainLoginModule required username=\"$ConnectionString\" password=\"YOUR_EVENTHUBS_CONNECTION_STRING\";")
    
      .option("checkpointLocation", "YOUR_CHECKPOINT_LOCATION")
    
      .start()
    
    

    Links to help you :

    If you have any further questions or need more specific examples, please refer to the links provided or feel free to ask!

    0 comments No comments

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.