Distributed programming on the cloud

Beginner
Intermediate
Developer
Student
Azure

Learn about how complex computer programs must be architected for the cloud by using distributed programming.

In this learning path, you'll:

  • Classify programs as sequential, concurrent, parallel, and distributed
  • Indicate why programmers usually parallelize sequential programs
  • Define distributed programming models
  • Discuss the challenges with scalability, communication, heterogeneity, synchronization, fault tolerance, and scheduling that are encountered when building cloud programs
  • Define heterogeneous and homogenous clouds, and identify the main reasons for heterogeneity in the cloud
  • List the main challenges that heterogeneity poses on distributed programs, and outline some strategies for how to address such challenges
  • State when and why synchronization is required in the cloud
  • Identify the main technique that can be used to tolerate faults in clouds
  • Outline the difference between task scheduling and job scheduling
  • Explain how heterogeneity and locality can influence task schedulers

In partnership with Dr. Majd Sakr and Carnegie Mellon University.

Prerequisites

  • Understand what cloud computing is, including cloud service models and common cloud providers
  • Know the technologies that enable cloud computing
  • Understand how cloud service providers pay for and bill for the cloud
  • Know what datacenters are and why they exist
  • Know how datacenters are set up, powered, and provisioned
  • Understand how cloud resources are provisioned and metered
  • Be familiar with the concept of virtualization
  • Know the different types of virtualization
  • Understand CPU virtualization
  • Understand memory virtualization
  • Understand I/O virtualization
  • Know about the different types of data and how they're stored
  • Be familiar with distributed file systems and how they work
  • Be familiar with NoSQL databases and object storage, and how they work

Modules in this learning path

Learn about distributed programming and why it's useful for the cloud, including programming models, types of parallelism, and symmetrical vs. asymmetrical architecture.

MapReduce was a breakthrough in big data processing that has become mainstream and been improved upon significantly. Learn about how MapReduce works.

GraphLab is a big data tool developed by Carnegie Mellon University to help with data mining. Learn about how GraphLab works and why it's useful.

Spark is an open-source cluster-computing framework with different strengths than MapReduce has. Learn about how Spark works.

The increase of available data has led to the rise of continuous streams of real-time data to process. Learn about different systems and techniques for consuming and processing real-time data streams.

This learning path and modules are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike International License.