Building Open Source Software (OSS) Analytics Solutions with Azure HDInsight

Intermediate
Data Engineer
Data Scientist
Azure HDInsight

In this learning path, the learner is introduced to HDInsight and how to apply this technology to solve a range of real world challenges.

Prerequisites

The following pre-requisite should be completed

  • Successfully log in to the Azure portal
  • Understand the Azure storage options
  • Understand the Azure compute options

Modules in this learning path

At the end of this module, you will learn that Azure HDInsight is a fully managed cloud service that enables you to efficiently process massive amounts of data using the most popular open source frameworks.

In this module, you learn the different configurations for ensuring optimal use of HDInsight from both a performance and cost perspective.

In this module, you learn how to create a HDInsight Cluster, monitor a cluster, and be aware of common provisioning issues.

Learn how HBase provides random access and strong consistency for large amounts of unstructured and semi structured data in a schema less database organized by column families.

In this module, you learn how to create real-time streaming data analytics pipelines and applications on the cloud by using Azure HDInsight with Apache Kafka and Apache Spark.

By the end of this module, you can perform ad hoc queries on a big-data set. Using HDInsight Interactive Query helps to achieve sub second query latencies.

Azure HDInsight with other Azure services provide a comprehensive multi-tiered security solution and is a shared responsibility between Microsoft and the customer.