Perform advanced streaming data transformations with Apache Spark and Kafka in Azure HDInsight

Intermediate
Data Engineer
Data Scientist
Azure HDInsight

In this module, you learn how to create real-time streaming data analytics pipelines and applications on the cloud by using Azure HDInsight with Apache Kafka and Apache Spark.

Learning objectives

At the end of this module, you understand:

  • When to use Apache Spark and Kafka with HDInsight.
  • Spark Structured Streaming.
  • The architecture of a Kafka and Spark solution.
  • How to provision HDInsight, create a Kafka producer, and stream Kafka data to a Jupyter notebook.
  • How to replicate data to a secondary cluster.

Prerequisites

The following prerequisite should be completed:

  • Successfully log in to the Azure portal.
  • Understand the Azure storage options.
  • Understand the Azure compute options.
  • Create and configure a HDInsight Cluster in the Azure portal.