Perform advanced streaming data transformations with Apache Spark and Kafka in Azure HDInsight
In this module, you will learn how to create real-time streaming data analytics pipelines and applications on the cloud by using Azure HDInsight with Apache Kafka and Apache Spark.
Learning objectives
At the end of this module you will understand:
- When to use Apache Spark and Kafka with HDInsight
- How Spark Structured Streaming works
- The architecture of a Kafka and Spark solution
- How to provision HDInsight, create a Kafka producer, and stream Kafka data to a Jupyter notebook
- How to replicate data to a secondary cluster
Prerequisites
The following pre-requisite should be completed:
- Successfully login to the Azure portal
- Understand the Azure storage options
- Understand the Azure compute options
- Create and configure a HDInsight Cluster in the Azure portal