Building Open Source Software (OSS) Analytics Solutions with Azure HDInsight

Data Engineer
Data Scientist

In this learning path, the learner will be introduced to HDInsight and how to apply this technology to solve a range of real world challenges.


The following pre-requisite should be completed

  • Successfully login to the Azure portal
  • Understand the Azure storage options
  • Understand the Azure compute options

Modules in this learning path

At the end of this module, you will learn that Azure HDInsight is a fully managed cloud service that enables you to efficiently process massive amounts of data using the most popular open source frameworks.

In this module, you will learn the different configurations for ensuring optimal use of HDInsight from both a performance and cost perspective.

In this module, you will learn how to create a HDInsight Cluster, monitor a cluster and be aware of common provisioning issues.

Learn how HBase provides random access and strong consistency for large amounts of unstructured and semi structured data in a schema less database organized by column families.

In this module, you will learn how to create real-time streaming data analytics pipelines and applications on the cloud by using Azure HDInsight with Apache Kafka and Apache Spark.

By the end of this module, you will be able to perform ad hoc queries on a big-data set. Using HDInsight Interactive Query helps to achieve sub second query latencies.

Azure HDInsight in conjunction with other Azure services provides a comprehensive multi-tiered security solution and is a shared responsibility between Microsoft and the customer.