Message queues and stream processing
The increase of available data has led to the rise of continuous streams of real-time data to process. Learn about different systems and techniques for consuming and processing real-time data streams.
Learning objectives
In this module, you will:
- Define a message queue and recall a basic architecture
- Recall the characteristics, and present the advantages and disadvantages, of a message queue
- Explain the basic architecture of Apache Kafka
- Discuss the roles of topics and partitions, as well as how scalability and fault tolerance are achieved
- Discuss general requirements of stream processing systems
- Recall the evolution of stream processing
- Explain the basic components of Apache Samza
- Discuss how Apache Samza achieves stateful stream processing
- Discuss the differences between the Lambda and Kappa architectures
- Discuss the motivation for the adoption of message queues and stream processing in the LinkedIn use case
In partnership with Dr. Majd Sakr and Carnegie Mellon University.
Prerequisites
- Understand what cloud computing is, including cloud service models and common cloud providers
- Know the technologies that enable cloud computing
- Understand how cloud service providers pay for and bill for the cloud
- Know what datacenters are and why they exist
- Know how datacenters are set up, powered, and provisioned
- Understand how cloud resources are provisioned and metered
- Be familiar with the concept of virtualization
- Know the different types of virtualization
- Understand CPU virtualization
- Understand memory virtualization
- Understand I/O virtualization
- Know about the different types of data and how they're stored
- Be familiar with distributed file systems and how they work
- Be familiar with NoSQL databases and object storage, and how they work
- Know what distributed programming is and why it's useful for the cloud
- Understand MapReduce and how it enables big-data computing
- Understand Spark and how it differs from MapReduce
- Understand GraphLab and how it differs from MapReduce and Spark