Kafka Connector for Databricks

pseudo p 0 Reputation points
2024-07-22T14:43:43.1133333+00:00

Can you forward me documentation to use Kafka Connector for Apache Spark Streaming job to connect to Azure Eventhub.

I am looking for maven library version etc.

Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
2,075 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. Amira Bedhiafi 19,946 Reputation points
    2024-07-22T21:35:24.21+00:00

    To integrate Azure Event Hubs with Apache Spark, you will need the following Maven coordinates:

    • Group ID: com.microsoft.azure
    • Artifact ID: azure-eventhubs-spark_2.12
    • Version: 2.3.22

    This dependency can be included in your project's pom.xml if you are using Maven:

    
    <dependency>
    
      <groupId>com.microsoft.azure</groupId>
    
      <artifactId>azure-eventhubs-spark_2.12</artifactId>
    
      <version>2.3.22</version>
    
    </dependency>
    
    

    Before you can connect, ensure you have an Event Hubs namespace created in Azure. You will need the Event Hubs connection string and the fully qualified domain name (FQDN).

    You can configure your Spark job to read from and write to Azure Event Hubs using the Kafka API.

    
    val df = spark.readStream
    
      .format("kafka")
    
      .option("subscribe", "YOUR_TOPIC_NAME")
    
      .option("kafka.bootstrap.servers", "YOUR_EVENTHUBS_NAMESPACE.servicebus.windows.net:9093")
    
      .option("kafka.sasl.mechanism", "PLAIN")
    
      .option("kafka.security.protocol", "SASL_SSL")
    
      .option("kafka.sasl.jaas.config", "org.apache.kafka.common.security.plain.PlainLoginModule required username=\"$ConnectionString\" password=\"YOUR_EVENTHUBS_CONNECTION_STRING\";")
    
      .option("kafka.request.timeout.ms", "60000")
    
      .option("kafka.session.timeout.ms", "30000")
    
      .option("kafka.group.id", "YOUR_GROUP_ID")
    
      .option("failOnDataLoss", "true")
    
      .load()
    
    df.writeStream
    
      .outputMode("append")
    
      .format("console")
    
      .start()
    
    

    For writing to Event Hubs:

    
    df.writeStream
    
      .format("kafka")
    
      .option("topic", "YOUR_TOPIC_NAME")
    
      .option("kafka.bootstrap.servers", "YOUR_EVENTHUBS_NAMESPACE.servicebus.windows.net:9093")
    
      .option("kafka.sasl.mechanism", "PLAIN")
    
      .option("kafka.security.protocol", "SASL_SSL")
    
      .option("kafka.sasl.jaas.config", "org.apache.kafka.common.security.plain.PlainLoginModule required username=\"$ConnectionString\" password=\"YOUR_EVENTHUBS_CONNECTION_STRING\";")
    
      .option("checkpointLocation", "YOUR_CHECKPOINT_LOCATION")
    
      .start()
    
    

    Links to help you :

    If you have any further questions or need more specific examples, please refer to the links provided or feel free to ask!

    0 comments No comments