This article is a solution idea. If you'd like us to expand the content with more information, such as potential use cases, alternative services, implementation considerations, or pricing guidance, let us know by providing GitHub feedback.
Ingest and process millions of streaming events per second with Apache Kafka, Apache Storm, and Apache Spark Streaming.
Potential use cases
Companies can use this solution to retrieve (or ingest) data from multiple sources and make real-time business decisions. Scenarios include:
- Analyzing data from IoT sensors for quality detection, fault analysis, maintenance event prediction
- Business integration of weather feed or sensor data (agriculture, retail)
- Analysis of real-time stock market data (financial)
- Analysis of current market conditions (insurance and finance)
- Trend analysis over real-time sales (retail)
Download a Visio file of this architecture.
- Streaming data is ingested, processed, and the results are stored by the following:
- Apache Kafka for data ingestion
- Apache Spark Streaming or Apache Storm for processing
- Apache HBase, which is a NoSQL database, for the storage of analyzed results
- The data is consumed by the user in the related apps.
- The data is visualized in Power BI.
- The data used by Azure HDInsight is stored in Azure Data Lake Storage for secure and scalable processing in the cloud.
Key technologies used to implement this architecture:
This article is maintained by Microsoft. It was originally written by the following contributors.
To learn more about these services, see the following articles:
- What is Azure HDInsight?
- What is streaming in HDInsight?
- Create Apache Hadoop cluster in HDInsight
- Introduction to Azure Data Lake Storage Gen2
- Create Apache Spark cluster - Portal
- Enterprise security in Azure HDInsight
- Extend your on-premises big data investments with HDInsight
- Extract, transform, and load (ETL) using HDInsight
- Campaign optimization with Azure HDInsight Spark clusters
- Loan charge-off prediction with Azure HDInsight Spark clusters
- Interactive querying with HDInsight
- Azure Kubernetes in event stream processing
- Instant IoT data streaming with AKS